$Id: RELEASE-NOTES,v 1.145 2003/06/01 12:32:45 mikpe Exp $

RELEASE NOTES
=============

Version 2.5.4, 2003-06-01
- The generic-x86-with-TSC driver now uses rdpmc_read_counters
  and p6_write_control instead of its own procedures.
- K8 docs are now available. Updated comment in x86.c accordingly.
- P4 OVF_PMI+FORCE_OVF counters didn't work at all, resulting in
  BUG messages from the driver since identify_overflow failed to
  detect which counters had overflowed, and vperfctr_ihandler
  left the vperfctr in an inconsistent state. This works now.
  However, hardware quirks makes this configuration only useful
  for one-shot counters, since resuming generates a new interrupt
  and the faulting instruction again doesn't complete. The same
  problem can occur with regular OVF_PMI counters if ireset is
  a small-magnitude value, like -5.
  This is a user-space problem; the driver survives.
- On P4, OVF_PMI+FORCE_OVF counters must have an ireset value of -1.
  This allows the regular overflow check to also handle FORCE_OVF
  counters. Not having this restriction would lead to MAJOR
  complications in the driver's "detect overflow counters" code.
  There is no loss of functionality since the ireset value doesn't
  affect the counter's PMI rate for FORCE_OVF counters.
- Moved P4 APIC_LVTPC reinit from p4_isuspend() to identify_overflow().
  Reduces context-switch overheads when i-mode counters are active.
- Corrected vperfctr_suspend()'s precondition.
- Corrected comment in <asm/perfctr.h> to state that ireset[]
  values must be negative rather than non-positive.
- Made 'perfctr_cpu_name' __initdata, like its predecessor.

Version 2.5.3.1, 2003-05-21
- Replaced 'char *perfctr_cpu_name[]' by 'char *perfctr_cpu_name'.
  This is needed for x86-64 and other non-x86 architectures.
- Changed <asm-x86_64/perfctr.h> to use 'long long' for 64-bit sums.
  This doesn't change the ABI, but improves user-space source code
  compatibility with 32-bit x86.
- Removed the !defined(set_cpus_allowed) check added to compat24.h
  in 2.5.3. It's wrong for SMP builds with modules and MODVERSIONS,
  since the set_cpus_allowed() emulation function becomes a #define
  from include/linux/modules/x86_setup.ver. Instead add the already
  used HAVE_SET_CPUS_ALLOWED #define to include/linux/config.h in
  the kernel patch, but make it conditional on CONFIG_X86_64.

Version 2.5.3, 2003-05-16
- Added detection code for Pentium M. MISC_ENABLE_PERF_AVAIL is
  now checked on both P4 and Pentium M.
- Added x86_64 driver code. Both x86_64.c and asm-x86_64/perfctr.h
  are basically simplified versions of corresponding x86 files,
  with P5 and P4 support removed, 2.2 kernel support removed, and
  'long long' for sums replaced by 'long'. The last change is
  painful for user-space and may be reverted.
- compat24.h: don't define set_cpus_allowed() if already #defined,
  workaround for RawHide's 2.4.20-9.2 x86_64 kernel.
- Removed list of supported CPUs from Kconfig. That information
  belongs elsewhere (and it's a pain to maintain for 2.2/2.4).

Version 2.5.2, 2003-04-13
- Minor cleanup: use PROC_I() unconditionally in virtual.c,
  implement trivial compat macro in compat24.h.
- Updated power management code for the local APIC and NMI
  watchdog driver model changes in kernel 2.5.67.
  The suspend/resume procedures are still no-ops, however.
  This revealed a bug in the lapic_nmi_watchdog resume code:
  it resumes the lapic_nmi_watchdog even when it was disabled
  before suspend. Perfctr's 2.5.67 kernel patch includes a fix.
- perfctr_sample_thread() is now used also on UP. Anton Ertl's
  2.26GHz UP P4 managed to execute a process for more than 2^32
  cycles before suspending it, causing TSC inaccuracies.
- RH9's 2.4.20-8 kernel changed cpu_online(), put_task_struct() and
  remap_page_range() to be more like in 2.5 kernels, and moved the
  declaration of ptrace_check_attach() from mm.h to ptrace.h, also
  like in 2.5 kernels, requiring fixes to compat24.h and x86_setup.c.
- Added note in x86.c about the new Pentium M processor.

Version 2.5.1, 2003-03-23
- Fix P4 HT initialisation. I've seen several boot logs from
  people running MP P4 Xeons with HT disabled: this produces
  an ugly "restricting access for CPUs 0x0" message, and would
  cause P4 HT init to unnecessarily return error in older kernels
  lacking set_cpus_allowed(). Now only print the message or
  signal error if non-zero siblings actually are found.
- The set_cpus_allowed() emulation doesn't compile in 2.4
  kernels older than 2.4.15 due to the p->cpus_running field.
  Updated version checks to skip it in 2.4.x when x<15.
- Fix set_cpus_allowed() emulation compile error on BUG_ON()
  in 2.4 kernels older than 2.4.19.
- Added Nehemiah note/reminder in x86.c:centaur_init().

Version 2.5.0, 2003-03-10
- Reverted the 2.5.0-pre2 change that replaced the PERFCTR_INFO
  ioctl by read(): it made the API look too weird.
  Added a PERFCTR_ABI ioctl which only retrieves 'abi_version'.
- Cleaned up struct perfctr_info: renamed abi_magic to abi_version,
  and version to driver_version. Renamed PERFCTR_*_MAGIC too.
- Cleaned up struct perfctr_cpu_control: moved evntsel_aux[]
  into the p4 sub-struct and renamed it as escr[]. Only P4 needs
  it anyway, and the new name clarifies its purpose.
- Renumbered the vperfctr ioctls to the 8-15 range (8-11 are used)
  and reserved 0-7 (0-1 are used) for generic ioctls.
- Added 'use_nmi' field to struct gperfctr_control, reserved for
  future use if/when support for i-mode gperfctrs is implemented.
- Replaced some preempt/smp_call_function combinations with 2.5.64's
  new on_each_cpu() construct. Added compatibility definitions to
  compat24.h and compat22.h.

Version 2.5.0-pre2, 2003-03-03
- Added ABI version to perfctr_info. Replaced PERFCTR_INFO ioctl
  by read() on the fd, since that allows reading the ABI version
  even in the case of a version mismatch. Removed binary layout
  magic number from vperfctr_state. Rearranged perfctr_info to
  make the 'long' fields 8-byte aligned.
- Added #ifdef CONFIG_KPERFCTR to <linux/perfctr.h> to ensure
  that <asm/perfctr.h> isn't included unless CONFIG_KPERFCTR=y.
  This allows the patched kernel source to compile cleanly also
  in archs not yet supported by perfctr.
- Removed PERFCTR_PROC_PID_MODE #define and replaced it with
  /*notype*/S_IRUSR in the patch files.
- Added perfctr_vector_init() to <asm-i386/perfctr.h>. Cleaned
  up arch/i386/kernel/i8259.c patch.
- Removed apic_lvtpc_irqs[] array. Removed irq.c patch.
- Updated CONFIG_PERFCTR_INIT_TESTS help text to match reality.
- Kernel 2.4.21-pre5 added set_cpus_allowed(), which required
  fixing compat24.h and x86_setup.c.
- Fixed init.c for kernel 2.5.63 removing EXPORT_NO_SYMBOLS.
- Cleaned up compat.h by moving 2.2/2.4 stuff to separate files.

Version 2.5.0-pre1, 2003-02-19
- Repair global perfctr API: the target CPUs are now explicit
  in the calls to write control and read state. Global perfctrs
  now work on 2.5 SMP kernels (which no longer have smp_num_cpus
  or cpu_logical_map()), and HT P4s (asymmetric MPs).
- struct perfctr_info has new bitmask fields for the set of CPUs
  (cpu_online_map) and forbidden CPUs; dropped the nrcpus field.
- add cpu_online() compat macro to compat.h
- VPERFCTR_STOP is subsumed by VPERFCTR_CONTROL. Removed it.
- Detect K8 as K8 not K7. They are not identical.
- Makefile cleanup: moved 2.4/2.2 kernel stuff to Makefile24.
- Makefile fix: removed export-objs for 2.5 kernels.
- Kconfig fix: don't mention obsolete .o module suffix.

Version 2.4.5, 2003-02-09
- Fixed two minor compile warnings in x86_tests.c for 2.5 kernels.

Version 2.4.4, 2003-01-18
- Fixed a bug in iresume() where an interrupt-mode counter could
  increment unexpectedly, and also miss the overflow interrupt.
  The following setup would cause the problem:
      P1 has EVNTSELn in non-interrupt mode, counting some high-
  frequency event (e.g. INST_RETIRED) in kernel-mode. P2 has
  EVNTSELn in interrupt-mode, counting some low-frequency event
  (e.g. MMX_ASSIST) in user-mode. P1 suspends. Since EVNTSELn is
  in non-interrupt mode, it is not disabled. P2 resumes. First
  iresume() finds that the CPU cache ID is not P2's, so it reloads
  PERFCTRn with P2's restart value. Then write_control() reloads
  EVNTSELn with P2's EVNTSEL. At this point, P2's PERFCTRn has been
  counting with P1's EVNTSELn since iresume(), so it will no longer
  equal P2's restart value. And if PERFCTRn overflowed, the overflow
  will go undetected since P1's EVNTSELn was in non-interrupt mode.
      To avoid this problem, iresume() now ensures that a counter's
  control register is disabled before reloading the counter.
- Fixed some ugly log messages from the new HT P4 init code:
  * forbidden_mask would be printed as "0X<mask>" (capital X)
  * finalise_backpatching() could trigger a BUG! printk from
    p4_write_control() if the CPU the init code runs on was
    in the forbidden set. At init-time this is not an error.
    Avoided this by temporarily resetting the forbidden_mask.
- Added preliminary support for AMD K8 processors with the
  regular 32-bit x86 kernel. The K8 performance counters appear
  to be identical or very similar to the K7 performance counters.

Version 2.4.3, 2002-12-11
- Added x86.c:perfctr_cpus_forbidden_mask. This bitmask describes
  the set of CPUs that must not access the perfctrs. On HT P4 MPs,
  only logical CPU #0 in each package is allowed access -- this
  avoids the resource conflict that would occur if both logical
  processors were to access the perfctrs. In other cases (UP or
  non-HT-P4 MPs) the mask is zero.
- vperfctr_control() now calls set_cpus_allowed() to ensure that
  the task stays away from CPUs in perfctr_cpus_forbidden_mask.
  This is racy with sys_sched_setaffinity(), and possibly some
  of the kernel's internal set_cpus_allowed() calls, but the race
  is unlikely to occur in current 2.4 kernels.
- Cleaned up the parameter passing protocol between vperfctr_ioctl()
  and the individual vperfctr "system call" procedures.
- Added safety check in global.c to disallow global-mode perfctrs
  on asymmetric MPs until the API has been fixed.
- Added set_cpus_allowed() implementation for 2.4 kernels, except
  those that already have it as indicated by HAVE_SET_CPUS_ALLOWED:
  this symbol is added to <linux/config.h> by the kernel patch.
- 2.2 kernels can't enforce CPU affinity masks, so x86.c warns if
  a HT P4 MP runs a 2.2 kernel, and falls back to generic x86 mode.
  Added dummy set_cpus_allowed() macro for 2.2 kernels.
- x86_compat.h now implements cpuid_ebx() and cpu_has_ht for old kernels.
- Makefile cleanup: Rules.make is obsolete in 2.5.
- Compile fixes in x86.c and virtual_stub.c: <linux/fs.h> needs to
  be included explicitly for the 2.5.50 kernel.

Version 2.4.2, 2002-11-25
- Fixed virtual.c:inc_nrctrs() to handle the -EBUSY case correctly.
  If the HW was busy (e.g. global running), then the first attempt
  to open a vperfctr would fail but further attempts would succeed.
  Updated error propagation to distinguish -EBUSY from -ENOMEM.
- Updated global.c for preempt-safety.
- Made the driver safe for preemptible kernels. This required a lot
  of analysis, but resulted in relatively few actual code changes.
  (Backport from the perfctr-3.1 branch.)
- Ported to 2.5.48: Replaced MOD_INC_USE_COUNT by try_module_get()
  and MOD_DEC_USE_COUNT by module_put(). Updated compat.h.
- Ported to 2.5.45: added Kconfig, removed Config.help.

Version 2.4.1, 2002-10-12
- RedHat 8.0's 2.4.18-14 kernel does EXPORT_SYMBOL(cpu_khz) while
  the vanilla 2.4.18 does not. This clashes with x86_setup.c's
  EXPORT_SYMBOL(cpu_khz). I've found no easy way to distinguish
  between these kernels at C preprocessing time, so I changed
  x86_setup.c to define a trivial perfctr_cpu_khz() function and
  EXPORT_SYMBOL that one instead.

Version 2.4.0, 2002-09-26
- Config.help updated to state that Pentium 4 is supported.
- 2.5.32 moved ptrace_check_attach() declaration to <linux/ptrace.h>.
- Removed redundant /proc/<pid>/perfctr access control check
  from vperfctr_stub_open(). Since 2.4.0-pre1 this check didn't
  match the real one, which prevented remote opens when the
  driver was built as a module.

Version 2.4.0-pre2, 2002-08-27
- vperfctr_control() now allows the user to specify that some PMC
  sums are not to be cleared when updating the control.
  There is a new bitmap field `preserve' in struct vperfctr_control:
  if bit i is set then PMC(i)'s sum is not cleared.
  `preserve' is a simple `unsigned long' for now, since this type
  fits all currently known CPU types.
  This change breaks binary compatibility, but user-space code which
  clears the entire control record before filling in relevant fields
  will continue to work as before after a recompile.
  This feature removes a limitation which some people felt was a
  problem for some usage scenarios.

Version 2.4.0-pre1, 2002-08-12
- Initial implementation of a new remote-control API for virtual
  per-process perfctrs. A monitor process may access a target
  process' perfctrs via /proc/pid/perfctr and operations on that
  file, if the monitor holds the target under ptrace ATTACH control.
  Updated virtual.c to allow remote access.
  Updated x86.c:perfctr_cpu_ireload() to work also in the remote
  control case on SMP machines.

Version 2.3.12, 2002-08-12
- Trivial comment fixes in compat.h and x86_compat.h.
- Removed __vperfctr_sample(), vperfctr_stub.sample, and bug_sample()
  from UP builds, since they are needed only on SMP.

Version 2.3.11, 2002-07-21
- Accumulated sums are now maintained for interrupt-mode perfctrs.
  User-space can use the standard syscall-less algorithm for computing
  these counters' current sums, should that be needed.

Version 2.3.10, 2002-07-19
- Added PERFCTR_X86_INTEL_P4M2 CPU type for Model 2 P4s, since
  they have ESCR Event Mask changes in a few events.
- The driver now supports replay tagging events on P4, using the
  pebs_enable and pebs_matrix_vert control fields added in 2.3.8.
- Some Pentium MMX and Pentium Pro processors have an erratum
  (Pentium erratum #74, Pentium Pro erratum 26) which causes SMM
  to shut down if CR4.PCE is set. intel_init() now clears the
  RDPMC feature on the affected steppings, to avoid the problem.
- perfctr_cpu_release() now clears the hardware registers and
  invalidates the per-cpu cache. This should allow the counter
  hardware to power down when not used, especially on P4.
- Callers of update_control() have no active i-mode counters.
  Documented this as a precondition, and changed update_control()
  to not call isuspend(). update_control() no longer needs hardware
  access, which should ease a port to CONFIG_PREEMPT=y.

Version 2.3.9, 2002-06-27
- Updated p4_escr_addr() in x86.c to match the latest revision of
  Intel's IA32 Volume 3 manual, #245472-007. An error in previous
  revisions of this document caused the driver to program the wrong
  ESCR in some cases. (CCCRs 12/13/16 with ESCR_SELECT(2) were mapped
  to SSU_ESCR0 instead of RAT_ESCR0, affecting the uop_type event.)

Version 2.3.8, 2002-06-26
- Added counter overflow interrupt support for Intel P4.
- 2.5.23 dropped smp_num_cpus and cpu_logical_map(). Added
  temporary workarounds to x86.c and global.c to allow compilation
  and testing under 2.5. May have to change the API (esp. global's)
  to be based on the sparse cpu_online_map instead.
- RedHat's 2.4.9-34 defines cpu_relax(). Updated compat.h.
- Added pebs_enable and pebs_matrix_vert fields (currently unused)
  to perfctr_cpu_control to support replay tagging events on P4.
  Updated the perfctr_cpu_state binary layout magic number.
- Silenced redefinition warnings for MSR_P6_PERFCTR0 and cpu_has_mmx.
- Updated Makefile for the 2.5.19 kernel's Makefile changes.
- Merged the P6 and K7 isuspend/iresume/write_control driver code.
- Added a VC3 specific clear_counters() procedure.
- Removed pointless code from perfctr_cpu_identify_overflow().
- Removed _vperfctr_get/set_thread() wrappers and thread->perfctr
  clobber checks from the DEBUG code. Removed unused "ibuf" and
  obsolete si_code fields from vperfctr state and control objects.
  Updated the vperfctr state magic number.
- Fixed the CONFIG_PREEMPT anti-dependency check in Config.in.
- vperfctr_control() now preserves the TSC sum on STOP;CONTROL
  transitions. The failure to do this caused problems for the
  PAPI P4 support being developed.

Version 2.3.7, 2002-04-14
- Kernel 2.5.8-pre3 changed the way APIC/SMP interrupt entries
  are defined. Defining these with asm() in C is no longer
  practical, so the kernel patch for 2.5.8-pre3 now defines
  the perfctr interrupt entry in arch/i386/kernel/entry.S.
- Permit use of cascading counters on P4: in the slave counter
  one sets the CASCADE flag instead of the ENABLE flag.
- Added P4 hyperthreading bit field definitions.
- Preliminary infrastructure to support a new remote-control
  interface via ptrace(). Updates to compat.h, virtual.c,
  virtual_stub.c, and x86_setup.c. ptrace_check_attach()
  emulation for older kernels is in x86_setup.c since
  virtual_stub.c isn't compiled if the driver isn't a module.

Version 2.3.6, 2002-03-21
- Rewrote sys_vperfctr_control() to do a proper suspend before
  updating the control, and to skip trying to preserve the TSC
  start value around the resume. This cleaned up the code and
  eliminated the bogus "BUG! resuming non-suspended perfctr"
  warnings that control calls to active perfctrs caused.
- Rewrote sys_vperfctr_iresume() to not preserve the TSC start
  value around the resume. Since we had just done a suspend(),
  this would cause double-accounting of the TSC.

Version 2.3.5, 2002-03-17
- Added detection of the VIA C3 Ezra-T processor.
- CPU detection now uses current_cpu_data instead of boot_cpu_data,
  to avoid the boot_cpu_data.x86_vendor bug which is present is
  all current 2.2/2.4/2.5 kernels. The bug caused the x86_vendor
  field to be cleared on SMP machines, which in turn tricked the
  driver to identify MP AMD K7 machines as MP Intel P6, with
  disastrous results when the wrong MSRs were programmed.
- Updated compat.h for /proc/<pid>/ inode change in 2.5.4.
- Added a check to prevent building on preemptible 2.4/2.5 kernels,
  since the driver isn't yet safe for those.
- Put perfctr's configuration help text in Config.help in this
  directory: kernel 2.5.3-pre5 changed from a having a common
  Configure.help file to having local Config.help files.

Version 2.3.4, 2002-01-23
- Updated virtual.c for remap_page_range() change in 2.5.3-pre1.
  Added emulation for older kernels to compat.h.
- Permit use of tagging on P4 for at-retirement counting. This may
  not yet work as expected, since up-stream (tag producing) counters
  aren't disabled at context switches: a process may therefore see
  more tagged uops than expected.
- Fixed uses of __FUNCTION__ to comply with changes in GCC 3.0.3.

Version 2.3.3, 2001-12-31
- Minor x86.c cleanup: reordered function definitions so that
  write_control comes after isuspend/iresume: this makes it easier
  to follow the runtime control flow.
- Fixed isuspend()/iresume()'s broken cache checking protocol. The
  old protocol didn't handle process migration across CPUs in SMP
  machines correctly, as illustrated by the following scenario:
      P1 runs on CPU1 and suspends. P1 and CPU1 now have the same
  cache id (->k1.id). P1 is resumed and suspended on CPU2: the state
  in CPU1 is now stale. Then P1 is resumed on CPU1, and no other
  process has been using CPU1's performance counters since P1's last
  suspend on CPU1. The old protocol would see matching cache ids and
  that P1's i-mode EVNTSELs are stopped, so it would accept the cache
  and resume P1 with CPU1's stale PERFCTRS values.
      In the new protocol isuspend() records the active CPU in the
  state object, and iresume() checks if both the CPU and the control
  id match. The new protocol is also simpler since iresume() no longer
  checks if the i-mode EVNTSELs are cleared or not.
- P6 nasty i-mode to a-mode context switch bug fixed: p6_isuspend()
  used to simply clear EVNTSEL0's Enable flag in order to stop all
  i-mode counters. Unfortunately, that was insufficient as shown by
  the following case (which actually happened).
      P1 has EVNTSEL0 in a-mode and EVNTSEL1 in i-mode. P1 suspends:
  PERFCTR1 is stopped but EVNTSEL1 is still in i-mode. P2 has EVNTSEL0
  in a-mode and no EVNTSEL1. P2 resumes and updates EVNTSEL0. This
  activates not only P2's PERFCTR0 but also the dormant PERFCTR1. If
  PERFCTR1 overflows, then P2 will receive an unexpected interrupt. If
  PERFCTR1 doesn't overflow, but P2 suspends and P1 resumes, then P1
  will find that PERFCTR1 has a larger than expected value.
      p6_isuspend() and p6_iresume() were changed to ignore the global
  Enable flag and to disable/enable each i-mode EVNTSEL individually,
  just like how it's done on the K7.
- x86.c cleanups: P5MMX, MII, C6, VC3, P6, K7, and P4 now all
  use the same rdpmc_read_counters() method. VIA C3 now uses
  p6_write_control() instead of its own method.
- Removed "pmc_map[] must be identity" restriction from P6 and K7.
  The API uses the virtual counter index to distinguish a-mode
  and i-mode counters, but P6 events aren't entirely symmetric:
  this lead to some strange cases with the old pmc_map[] rule.
      P6 and K7 isuspend() now need access to the control, so
  update_control() and its callers had to be changed to allow it
  to isuspend() _before_ the new control is installed.
- P4 write_control fixes: changed the ESCR cache to be indexed by
  MSR offset from 0x3A0, and changed P4 write_control to index the
  CCCR/ESCR cache with physical instead of virtual indices. Added
  call to debug_evntsel_cache(), after updating it for pmc_map[].
- Added P4 and Generic support to x86_tests.c, and some cleanups.

Version 2.3.2, 2001-11-19
- P4 fix: the mapping from CCCR 17 to its associated ESCRs was
  wrong due to an off-by-one error in x86.c:p4_escr_addr().
- P4 fix: also clear the PEBS MSRs when initialising the driver.
- Minor cleanup in x86.c: replaced the "clear MSRs" loops with
  calls to a helper procedure.

Version 2.3.1, 2001-11-06
- Microscopic P4 cleanups. Testing on my new P4 box has confirmed
  that the PMAVAIL flag in MSR_IA32_MISC_ENABLE is read-only.

Version 2.3, 2001-10-24
- Added support for multiple interrupt-mode virtual perfctrs
  with automatic restart. Added an identify_overflow() method
  to x86.c to identify and reset the overflowed counters.
  Added checks to ensure that the user-specified restart values
  for interrupt-mode counters are negative.
  Updated virtual.c's signal delivery interface to pass a
  bitmask describing which counters overflowed; the siginfo
  si_code is now fixed as SI_PMC_OVF (fault-class).
- Fixed some typos in x86.c. Added a note about the C3 Ezra.
- Added EXPORT_NO_SYMBOLS to init.c, for compatibility with
  announced changes in modutils 2.5.

Version 2.2, 2001-10-09
- Added preliminary support for the Pentium 4. Only basic stuff
  for now: no cascading counters, overflow interrupts, tagged
  micro-ops, or use of DS/PEBS. The code compiles but hasn't been
  tested on an actual Pentium 4.

Version 2.1.4, 2001-09-30
- No driver-level changes.

Version 2.1.3, 2001-09-13
- Fixed a compilation problem where virtual_stub couldn't be compiled
  in modular kernels older than 2.2.20pre10 if KMOD was disabled, due
  to an incompatible stub definition of request_module().
- Replaced most occurrences of "VIA Cyrix III / C3" with "VIA C3".

Version 2.1.2, 2001-09-05
- Added MODULE_LICENSE() tag, for compatibility with the tainted/
  non-tainted kernel stuff being put into 2.4.9-ac and modutils.
- VIA C3 support is not "preliminary" any more. Testing has revealed
  that the reserved bits in the C3's EVNTSEL1 have no function and
  need not be preserved. The driver now fills these bits with zeroes.
  (Thanks to Dave Jones @ SuSE for running these tests.)
- Minor bug fix in the perfctr interrupt assembly code.
  (Inherited from the 2.4 kernel. Fixed in 2.4.9-ac4.)

Version 2.1.1, 2001-08-28
- Preliminary recognition of Pentium 4 processors, including
  checking the IA32_MISC_ENABLE MSR.
- Moved %cr4 access functions from <asm-i386/perfctr.h> to
  x86_compat.h, to work around changes in 2.4.9-ac3.
- More %cr4 cleanups possible since the removal of dodgy_tsc()
  in Version 2.1: moved {set,clear}_in_cr4_local() into x86.c,
  and eliminated the set_in_cr4() compat macro.
- Fixed a bug in x86.c:finalise_backpatching(): the fake cstatus
  mustn't include i-mode counters unless we have PCINT support.
  Failure to check this cased fatal init-time oopses in some
  configs (CONFIG_X86_UP_APIC set but no local APIC in the CPU).
- Minor comment updates in x86.c due to AMD #22007 Revision J.
- Removed '%' before 'cr4' in printouts from x86_tests.c, to
  avoid the '%' being mutated by log-reading user-space code.

Version 2.1, 2001-08-19
- Fixed a call backpatching bug, caused by an incompatibility
  between the 2.4 and 2.2 kernels' xchg() macros. The 2.2 version
  lacks a "volatile" causing gcc to remove the entire statement
  if xchg() is used for side-effect only. Reverted to a plain
  assignment, which is safe since the 2.0.1 backpatching changes.
- Fixed a bug where an attempt to use /proc/<pid>/perfctr on an
  unsupported processor would cause a (well-behaved) kernel oops,
  due to calling a NULL function pointer in x86.c, vperfctr_open()
  now returns -ENODEV if virtual.c hasn't been initialised.
- Removed the WinChip configuration option, the dodgy_tsc() callback,
  and the clr_cap_tsc() x86_compat macro. WinChip users should configure
  for generic 586 or less and use the kernel's "notsc" boot parameter.
  This cleans up the driver and the 2.4 kernel patches, at the expense
  of more code in the 2.2 kernel patches to implement "notsc" support.
- Minor cleanup: moved version number definition from init.c to
  a separate file, version.h.

Version 2.0.1, 2001-08-14
- The unsynchronised backpatching in x86.c didn't work on SMP,
  due to Pentium III erratum E49, and similar errata for other
  P6 processors. (The change in 2.0-pre6 was insufficient.)
  x86.c now finalises the backpatching at driver init time,
  by "priming" the relevant code paths. To make this feasible,
  the isuspend() and iresume() methods are now merged into
  the other high-level methods; virtual.c became a bit cleaner.
- Removed obsolete "WinChip pmc_map[] must be identity" check.

Version 2.0, 2001-08-08
- Resurrected partial support for interrupt-mode virtual perfctrs.
  virtual.c permits a single i-mode perfctr, in addition to TSC
  and a number of a-mode perfctrs. BUG: The i-mode PMC must be last,
  which constrains CPUs like the P6 where we currently restrict
  the pmc_map[] to be the identity mapping. (Not a problem for
  K7 since it is symmetric, or P4 since it is expected to use a
  non-identity pmc_map[].)
  New perfctr_cpu_ireload() procedure to force reload of i-mode
  PMCs from their start values before resuming. Currently, this
  just invalidates the CPU cache, which forces the following
  iresume() and resume() to do the right thing.
  perfctr_cpu_update_control() now calls setup_imode_start_values()
  to "prime" i-mode PMCs from the control.ireset[] array.
- Bug fix in perfctr_cpu_update_control(): start by clearing cstatus.
  Prevents a failed attempt to update the control from leaving the
  object in a state with old cstatus != 0 but new control.

Version 2.0-pre7, 2001-08-07
- Cleaned up the driver's debugging code (virtual, x86).
- Internal driver rearrangements. The low-level driver (x86) now handles
  sampling/suspending/resuming counters. Merged counter state (sums and
  start values) and CPU control data to a single "CPU state" object.
  This simplifies the high-level drivers, and permits some optimisations
  in the low-level driver by avoiding the need to buffer tsc/pmc samples
  in memory before updating the accumulated sums (not yet implemented).
- Removed the read_counters, write_control, disable_rdpmc, and enable_rdpmc
  methods from <asm/perfctr.h>, since they have been obsoleted by the
  new suspend/resume/sample methods.
- Rearranged the 'cstatus' encoding slightly by putting 'nractrs' in
  the low 7 bits; this was done because 'nractrs' is retrieved more
  often than 'nrctrs'.
- Removed the obsolete 'status' field from vperfctr_state. Exported
  'cstatus' and its access methods to user-space. (Remove the
  control.tsc_on/nractrs/nrictrs fields entirely?)
- Removed WinChip "fake TSC" support. The user-space library can now
  sample with slightly less overhead on sane processors.
- WinChip and VIA C3 now use p5mmx_read_counters() instead of their
  own versions.

Version 2.0-pre6, 2001-07-27
- New patches for kernels 2.4.6, 2.4.7, and 2.4.7-ac1.
- Sampling bug fix for SMP. Normally processes are suspended and
  resumed many times per second, but on SMP machines it is possible
  for a process to run for a long time without being suspended.
  Since sampling is performed at the suspend and resume actions,
  a performance counter may wrap around more than once between
  sampling points. When this occurs, the accumulated counts will
  be highly variable and much lower than expected.
  A software timer is now used to ensure that sampling deadlines
  aren't missed on SMP machines. (The timer is run by the same code
  which runs the ITIMER_VIRTUAL interval timer.)
- Bug fix in the x86 "redirect call" backpatching routine. To be
  SMP safe, a bus-locked write to the code must be used.
- Bug fix in the internal debugging code (CONFIG_PERFCTR_DEBUG).
  The "shadow" data structure used to detect if a process' perfctr
  pointer has been clobbered could cause lockups with SMP kernels.
  Rewrote the code to be simpler and more robust.
- Minor performance tweak for the P5/P5MMX read counters procedures,
  to work around the P5's cache which doesn't allocate a cache line
  on a write miss.
- To avoid undetected data layout mismatches, the user-space library
  now checks the data layout version field in a virtual perfctr when
  it is being mmap:ed into the user's address space.
- A few minor cleanups.  

Version 2.0-pre5, 2001-06-11
- Internally use a single 'cstatus' field instead of the three
  tsc_on/nractrs/nrictrs fields. Should reduce overhead slightly.
- Reorder the fields in cpu_control so that 'cstatus' and other
  frequently used fields get small offsets -- avoids some disp32
  addressing modes in timing-critical code.
- Fixed a bug in p6_iresume where it forgot to invalidate the
  EVNTSEL cache, causing p6_write_control to fail to reload the
  MSRs. (K7 had a similar bug.) Since i-mode support is disabled
  at the moment, no-one was actually bitten by this.
- Fixed another iresume/write_control cache invalidation bug where a
  switch to an "uninitialised" CPU would fail to initialise the MSRs.
- Added a CONFIG_PERFCTR_DEBUG option to enable internal consistency
  checks. Currently, this checks that a task's vperfctr pointer
  isn't clobbered behind our backs, that resume and suspend for
  a vperfctr are performed on the same CPU, and that the EVNTSEL
  cache is semi-consistent when reloading is optimised away.
  ("semi" because it only checks that the cache agrees with the
  user's control data, and not that the cache agrees with the MSRs.)
- Minor cleanups.

Version 2.0-pre4, 2001-04-30
- Cleanups in x86.c. #defines introduced for magic constants.
  More sharing of procedures between different CPU drivers.
  Fixed a bug where k7_iresume() could cause k7_write_control()
  to fail to reload the correct EVNTSELs.
  The WinChip C6/2/3 driver now "fakes" an incrementing TSC.
- General cleanups: s/__inline__/inline/ following Linux kernel
  coding standards, and renamed the low-level control objects to
  cpu_control to distinguish them from {v,g}perfctr_control objects.
- O_CREAT is now interpreted when /proc/self/perfctr is opened:
  if the vperfctr does not exist, then it is created; if the
  vperfctr does exist, then EEXIST is returned (unfortunately
  O_EXCL doesn't work, since it's intercepted by the VFS layer).
  "perfex -i" uses this to avoid having to create a vperfctr when
  only an INFO command is to be issued.
  libperfctr.c:vperfctr_open() uses this to decide whether to
  UNLINK the newly opened vperfctr in case of errors or not.
- Cleaned up virtual.c's 2.4/2.2 VFS interface code a little,
  and eliminated the OWNER_THIS_MODULE compat macro.
- Added MOD_{INC,DEC}_USE_COUNTs to virtual.c's file_operations
  open and release procedures for 2.2 kernels. This should
  simulate 2.4's fops_get/put at >open() and >release().

Version 2.0-pre3, 2001-04-17
- Interrupt-mode virtual perfctrs are temporarily disabled since
  x86.c doesn't yet detect which PMC overflowed. The old API
  could be made to work, but it was broken anyway.
- Integrated the new P4-ready data structures and APIs.
  The driver compiles but the user-space stuff hasn't been
  updated yet, so there may be some remaining bugs.

  I have not yet committed to all details of this API. Some
  things, like accumulating counters in virtual.c and global.c,
  are uglier now, and going from a single "status == nrctrs"
  field to three separate fields (tsc_on, nrctrs, nrictrs)
  cannot be good for performance.

  In the new API the control information is split in separate
  arrays depending on their use, i.e. a struct-of-arrays layout
  instead of an array-of-struct layout. The advantage of the
  struct-of-arrays layout is that it should cause fewer cache
  lines to be touched at the performance-critical operations.
  The disadvantage is that the layout changes whenever the
  number of array elements has to be increased -- as is the
  case for the future Pentium 4 support (18 counters).

Version 2.0-pre2, 2001-04-07
- Removed automatic inheritance of per-process virtual perfctrs
  across fork(). Unless wait4() is modified, it's difficult to
  communicate the final values back to the parent: the now
  abandoned code did this in a way which made it impossible
  to distinguish one child's final counts from another's.
  Inheritance can be implemented in user-space anyway, so the
  loss is not great. The interface between the driver and the rest
  of the kernel is now smaller and simpler than before.
- Simulating cpu_khz by a macro in very old kernels broke since
  there's also a struct field with that name :-( Instead of
  putting the ugly workaround back in, I decided to drop support
  for kernels older than 2.2.16.
- Preliminary support for the VIA C3 processor -- the C3 is
  apparently a faster version of the VIA Cyrix III.
- Added rdtsc cost deduction to the init tests code, and changed
  it to output per-instruction costs as well.
- More cleanups, making 2.2 compatibility crud less visible.

Version 2.0-pre1, 2001-03-25
- First round of API and coding changes/cleanups for version 2.0:
  made perfctr_info.version a string, moved some perfctr_info inits
  to x86.c and eliminated some redundant variables, removed dead VFS
  code from virtual.c, removed obsolete K7 tests from x86_tests.c,
  removed mmu_cr4_features wrappers from x86_compat.h, minor cleanup
  in virtual_stub.c.
- Fixed an include file problem which made some C compilers (not gcc)
  fail when compiling user-space applications using the driver.
- Added missing EXPORT_SYMBOL declarations needed by the UP-APIC PM
  code when the driver is built as a module.
- Preliminary changes in x86.c to deal with UP-APIC power management
  issues in 2.4-ac kernels. The PM callback is only a stub for now.

Version 1.9, 2001-02-13
- Fixed compilation problems for 2.2 and SMP kernels.
- Found updated documentation on "VIA Cyrix III". Apparently, there
  are two distinct chips: the older Joshua (a Cyrix design) and the
  newer Samuel (a Centaur design). Our current code supported Joshua,
  but mistook Samuel for Joshua. Corrected the identification of Samuel
  and added explicit support for it. Samuel's EVNTSEL1 is not well-
  documented, so there are some new Samuel-specific tests in x86_tests.c.
- Added preliminary interrupt-mode support for AMD K7.
- Small tweaks to virtual.c's interrupt handling.

Version 1.8, 2001-01-23
- Added preliminary interrupt-mode support to virtual perfctrs.
  Currently for P6 only, and the local APIC must have been enabled.
  Tested on 2.4.0-ac10 with CONFIG_X86_UP_APIC=y.
  When an i-mode vperfctr interrupts on overflow, the counters are
  suspended and a user-specified signal is sent to the process. The
  user's signal handler can read the trap pc from the mmap:ed vperfctr,
  and should then issue an IRESUME ioctl to restart the counters.
  The next version will support buffering and automatic restart.
- Some cleanups in the x86.c init and exit code. Removed the implicit
  smp_call_function() calls from x86_compat.h.

Version 1.7, 2001-01-01
- Updated Makefile for 2.4.0-test13-pre3 Rules.make changes.
- Removed PERFCTR_ATTACH ioctl from /dev/perfctr, making the
  vperfctrs only accessible via /proc/self/perfctr. Removed
  the "attach" code from virtual.c, and temporarily commented
  out the "vperfctr fs" code. Moved /dev/perfctr initialisation
  and implementation from init.c to global.c.
- Eliminated CONFIG_VPERFCTR_PROC, making /proc/pid/perfctr
  mandatory if CONFIG_PERFCTR_VIRTUAL is set.
- Some 2.2/2.4 compatibility cleanups.
- VIA Cyrix III detection bug fix. Contrary to VIA's documentation,
  the Cyrix III vendor field is Centaur, not Cyrix.

Version 1.6, 2000-11-21
- Preliminary implementation of /proc/pid/perfctr. Seems to work,
  but virtual.c and virtual_stub.c is again filled with
  #if LINUX_VERSION_CODE crap which will need to be cleaned up.
  The INFO ioctl is now implemented by vperfctrs too, to avoid the
  need for opening /dev/perfctr.
- virtual.c now puts the perfctr pointer in filp->private_data
  instead of inode->u.generic_ip. The main reason for this change
  is that proc-fs places a dentry pointer in inode->u.generic_ip.
- sys_vperfctr_control() no longer resets the virtual TSC
  if it already is active. The virtual TSC therefore runs
  continuously from its first activation until the process
  stops or unlinks its vperfctrs.
- Updates for 2.4.0-test11pre6. Use 2.4-style cpu_has_XXX
  feature testing macros. Updated x86_compat.h to implement
  missing cpu_has_mmx and cpu_has_msr, and compatibility
  macros for 2.2. Changed vperfctr_fs_read_super() to use
  new_inode(sb) instead of get_empty_inode() + some init code.
- Updates for 2.4.0-test9. Fixed x86_compat.h for cpu_khz change.
  Since drivers/Makefile was converted to the new list style,
  it became more difficult to handle CONFIG_PERFCTR=m. Changed
  Config.in to set CONFIG_KPERFCTR=y when CONFIG_PERFCTR != n,
  resulting in a much cleaner kernel patch for 2.4.0-test9.
- Removed d_alloc_root wrapper since 2.2 doesn't need it any more.
- When building for 2.2.18pre, use some of its 2.4 compatibility
  features (module_init, module_exit and DECLARE_MUTEX).
- Updates for 2.4.0-test8: repaired kernel patch for new parameter
  in do_fork, and fixed CLONE_PERFCTR conflict with CLONE_THREAD.

Version 1.5, 2000-09-03
- Dropped support for intermediate 2.3 and early 2.4.0-test kernels.
  The code now supports kernels 2.2.xx and 2.4.0-test7 or later only.
  Cleanups in compat.h and virtual.c.
- Rewrote the Makefile to use object file lists instead of conditionals.
  This gets slightly hairy since kernel extensions are needed even
  when the driver proper is built as a module.
- Removed the definition of CONFIG_PERFCTR_X86 from Config.in.
  Use the 2.4 standard CONFIG_X86 instead. The 2.2.xx kernel
  patches now define CONFIG_X86 in arch/i386/config.in.
- Cleaned up the vperfctr inheritance filter. Instead of setting
  a disable flag (CLONE_KTHREAD) when kernel-internal threads are
  created, I now set CLONE_PERFCTR in sys_fork and sys_vfork.
- /dev/perfctr no longer accepts the SAMPLE and UNLINK ioctls.
  All operations pertaining to a process' virtual perfctrs must
  be applied to the fd returned from the ATTACH ioctl.
- Removed the remote-control features from the virtual perfctrs.
  Significant simplifications in virtual.c. Removed some now
  unused stuff from compat.h and virtual_stub.c.

Version 1.4, 2000-08-11
- Fixed a memory leak bug in virtual.c. An extraneous dget() in
  get_vperfctr_filp() prevented reclaiming the dentry and inode
  allocated for a vperfctr file.
- Major changes to the VFS interface in virtual.c. Starting with
  2.4.0-test6, inode->i_sb == NULL no longer works. Added code to
  register a "vperfctr" fs and define a superblock and a mount point.
  Completely rewrote the dentry init code. Most of the new code is
  adapted from fs/pipe.c, with simplifications and macros to continue
  supporting 2.2.x kernels. `ls -l /proc/*/fd/' now prints recognizable
  names for vperfctr files.
- Cleaned up virtual.c slightly. Removed "#if 1" tests around the
  vperfctr inheritance code. Rewrote vperfctr_alloc and vperfctr_free
  to use the virt_to_page and {Set,Clear}PageReserved macros;
  also updated compat.h to provide these for older kernels.
- Updated for 2.4.0-test3: a dummy `open' file operation is no longer
  required by drivers/char/misc.c.
- Updated for `owner' field in file_operations added in 2.4.0-test2.
  Removed MOD_{INC,DEC}_USE_COUNT from init.c (except when compiling
  for 2.2.x) and virtual.c. Added MOD_{INC,DEC}_USE_COUNT to the
  reserve/release functions in x86.c -- needed because the driver
  may be active even if no open file refers to it. Using can_unload
  in the module struct instead is possible but not as tidy.

Version 1.3, 2000-06-29
- Implemented inheritance for virtual perfctrs: fork() copies the
  evntsel data to the child, exit() stops the child's counters but
  does not detach the vperfctr object, and wait() adds the child's
  counters to the parent's `children' counters.
  Added a CLONE_KTHREAD flag to prevent inheritance to threads
  created implicitly by request_module() and kernel_thread().
- Fixed a half-broken printk() in x86_tests.c.
- Added checks to virtual.c to prevent the remote-control interface
  from trying to activate dead vperfctrs.
- Updated vperfctr_attach() for changes in 2.3.99-pre7 and 2.4.0-test2.
- Fixed a problem introduced in 1.2 which caused linker errors if
  CONFIG_PERFCTR=m and CONFIG_PERFCTR_INIT_TESTS=y.
- Export CPU kHz via a new field in PERFCTR_INFO ioctl, to enable
  user-space to map accumulated TSC counts to actual time.

Version 1.2, 2000-05-24
- Added support for generic x86 processors with a time-stamp counter
  but no performance-monitoring counters. By using the driver to
  virtualise the TSC, accurate cycle-count measurements are now
  possible on PMC-less processors like the AMD K6.
- Removed some of the special-casing of the x86 time-stamp counter.
  It's now "just another counter", except that no evntsel is
  needed to enable it.
- WinChip bug fix: the "fake TSC" code would increment an
  uninitialised counter.
- Reorganised the x86 driver. Moved the optional init-time testing
  code to a separate source file.
- Miscellaneous code cleanups and naming convention changes.

Version 1.1, 2000-05-13
- vperfctr_attach() now accepts pid 0 as an alias for the current
  process. This reduces the number of getpid() calls needed in
  the user-space library. (Suggested by Ulrich Drepper.)
- Added support for the VIA Cyrix III processor.
- Tuned the x86 driver interface. Replaced function pointers
  with stubs which rewrite callers to invoke the correct callees.
- Added ARRAY_SIZE definition to compat.h for 2.2.x builds.
- Updated for 2.3.48 inode changes.
- Moved code closer to 2.3.x coding standards. Removed init_module
  and cleanup_module, added __exit, module_init, and module_exit,
  and extended "compat.h" accordingly. Cleaned up <linux/perfctr.h>
  and <asm-i386/perfctr.h> a little.

Version 1.0, 2000-01-31
- Prepared the driver to cope with non-x86 architectures:
  - Moved generic parts of <asm-i386/perfctr.h> to <linux/perfctr.h>.
  - Merged driver's private "x86.h" into <asm-i386/perfctr.h>.
  - Config.in now defines CONFIG_PERFCTR_${ARCH}, and Makefile uses
    it to select appropriate arch-dependent object files
- The driver now reads the low 32 bits of the counters,
  instead of 40 or 48 bits zero-extended to 64 bits.
  Sums are still 64 bits. This was done to reduce the number
  of cache lines needed for certain data structures, to
  simplify and improve the performance of the sampling
  procedures, and to change 64+(64-64) arithmetic to 64+(32-32)
  for the benefit of gcc on x86. This change doesn't reduce
  precision, as long as no event occurs more than 2^32 times
  between two sampling points.
- PERFCTR_GLOBAL_READ now forces all CPUs to be sampled, if the
  sampling timer isn't running.

Version 0.11, 2000-01-30
- Added a missing EXPORT_SYMBOL which prevented the driver
  from being built as a module in SMP kernels.
- Support for the CPU sampling instructions (i.e. RDPMC and
  RDTSC on x86) is now announced explicitly by PERFCTR_INFO.
- The x86 hardware driver now keeps CR4.PCE globally enabled.
  There are two reasons for this. First, the cost of toggling
  this flag at process suspend/resume is high. Second, changes
  in kernel 2.3.40 imply that any processor's %cr4 may be updated
  asynchronously from the global variable mmu_cr4_features.

Version 0.10, 2000-01-23
- Added support for global-mode perfctrs (global.c).
- There is now a config option controlling whether to
  perform init-time hardware tests or not.
- Added a hardware reserve/release mechanism so that multiple
  high-level services don't simultaneously use the hardware.
- The driver is now officially device <char,major 10,minor 182>.
- Tuned the 64-bit tsc/msr/pmc read operations in x86.c.
- Support for virtual perfctrs can now be enabled or disabled
  via CONFIG_PERFCTR_VIRTUAL.
- Added support for the WinChip 3 processor.
- Split the code into several files: x86.c (x86 drivers),
  virtual.c (virtualised perfctrs), setup.c (boot-time actions),
  init.c (driver top-level and init code).

Version 0.9, 2000-01-02
- The driver can now be built as a module.
- Dropped sys_perfctr() system call and went back to using a
  /dev/perfctr character device. Generic operations are now
  ioctl commands on /dev/perfctr, and control operations on
  virtual perfctrs are ioctl commands on their file descriptors.
  Initially this change was done because new system calls in 2.3.x
  made maintenance and binary compatibility with 2.2.x hard, but
  the new API is actually cleaner than the previous system call.
- Moved this code from arch/i386/kernel/ to drivers/perfctr/.

Version 0.8, 1999-11-14
- Made the process management callback functions inline to
  reduce scheduling overhead for processes not using perfctrs.
- Changed the 'status' field to contain the number of active
  counters. Changed read_counters, write_control, and accumulate
  to use this information to avoid unnecessary work.
- Fixed a bug in k7_check_control() which caused it to
  require all four counters to be enabled.
- Fixed sys_perfctr() to return -ENODEV instead of -ENOSYS
  if the processor doesn't support perfctrs.
- Some code cleanups.
- Evntsel MSRs are updated lazily, and counters are not written to.

  The following table lists the costs (in cycles) of various
  instructions which access the counter or evntsel registers.
  The table was derived from data collected by init-time tests
  run by previous versions of this driver.

  Processor		P5	P5MMX	PII	PIII	K7
  Clock freq. (MHz)	133	233	266	450	500

  RDPMC			n/a	14	31	36	13
  RDMSR (counter)	29	28	81	80	52
  WRMSR (counter)	35	37	97	115	80
  WRMSR (evntsel)	33	37	88	105	232

  Several things are apparent from this table:

  1. It's much cheaper to use RDPMC than RDMSR to read the counters.
  2. It's much more expensive to reset a counter than to read it.
  3. It's expensive to write to an evntsel register.

  As of version 0.8, this driver uses the following strategies:
  * The evntsel registers are updated lazily. A per_cpu_control[]
    array caches the contents of each CPU's evntsel registers,
    and only when a process requires a different setup are the
    evntsel registers written to. In most cases, this eliminates the
    need to reprogram the evntsel registers when switching processes.
    The older drivers would write to the evntsel registers both at
    process suspend and resume.
  * The counter registers are read both at process resume and suspend,
    and the difference is added to the process' accumulated counters.
    The older drivers would reset the counters at resume, read them
    at suspend, and add the values read to the accumulated counters.
  * Only those registers enabled by the user's control information
    are manipulated, instead of blindly manipulating all of them.

Version 0.7 1999-10-25
- The init-time checks in version 0.6 of this driver showed that
  RDMSR is a lot slower than RDPMC for reading the PMCs. The driver
  now uses RDPMC instead of RDMSR whenever possible.
- Added an mmap() operation to perfctr files. This allows any client
  to read the accumulated counter state without making a system call.
  The old "sync to user-provided buffer" method has been removed,
  as it entailed additional copy operations and only worked for the
  "active" process. The PERFCTR_READ operation has been replaced
  by a simpler PERFCTR_SAMPLE operation, for the benefit of pre-MMX
  Intel P5 processors which cannot sample counters in user-mode.
  This rewrite actually simplified the code.
- The AMD K7 should now be supported correctly. The init-time checks
  in version 0.6 of this driver revealed that each K7 counter has
  its own ENable bit. (Thanks to Nathan Slingerland for running the
  test and reporting the results to me.)
- Plugged a potential memory leak in perfctr_attach_task().
- No longer piggyback on prctl(); sys_perfctr() is a real system call.
- Some code cleanups.

Version 0.6 1999-09-08
- Temporarily added some init-time code that checks the
  costs of RDPMC/RDMSR/WRMSR operations applied to perfctr MSRs,
  the semantics of the ENable bit on the Athlon, and gets
  the boot-time value of the WinChip CESR register.
  This code can be turned off by #defining INIT_DEBUG to 0.
- Preliminary support for the AMD K7 Athlon processor.
- The code will now build in both 2.3.x and 2.2.x kernels.

Version 0.5 1999-08-29
- The user-space buffer is updated whenever state.status changes,
  even when a remote command triggers the change.
- Reworked and simplified the high-level code. All accesses
  now require an attached file in order to implement proper
  accounting and syncronisation. The only exception is UNLINK:
  a process may always UNLINK its own PMCs.
- Fixed counting bug in sys_perfctr_read().
- Improved support for the Intel Pentium III.
- Another WinChip fix: fake TSC update at process resume.
- The code should now be safe for 'gcc -fstrict-aliasing'.

Version 0.4 1999-07-31
- Implemented PERFCTR_ATTACH and PERFCTR_{READ,CONTROL,STOP,UNLINK}
  on attached perfctrs. An attached perfctr is represented as a file.
- Fixed an error in the WinChip-specific code.
- Perfctrs now survive exec().

Version 0.3 1999-07-22
- Interface now via sys_prctl() instead of /dev/perfctr.
- Added NYI stubs for accessing other processes' perfctrs.
- Moved to dynamic allocation of a task's perfctr state.
- Minor code cleanups.

Version 0.2 1999-06-07
- Added support for WinChip CPUs.
- Restart counters from zero, not their previous values. This
  corrected a problem for Intel P6 (WRMSR writes 32 bits to a PERFCTR
  MSR and then sign-extends to 40 bits), and also simplified the code.
- Added support for syncing the kernel's counter values to a user-
  provided buffer each time a process is resumed. This feature, and
  the fact that the driver enables RDPMC in processes using PMCs,
  allows user-level computation of a process' accumulated counter
  values without incurring the overhead of making a system call.

Version 0.1 1999-05-30
- First public release.
