Start Meine Startseite Projekte Code Schnipsel Projekt Eröffnungen BenchIT
Kurzbeschreibung Aktivität SourceCodeManagement Dateien Mediawiki

FusionForge

X86membench

Aus BenchIT Wiki
Wechseln zu: Navigation, Suche

Inhaltsverzeichnis

Latency and bandwidth benchmarks for multicore and multiprocessor x86 based Systems

Set of memory benchmarks targeted at the analysis of complex memory hierarchies. Tailored at performance of data exchange between cores of a multicore processor and between processors. Provided under BSD License without any warranty. Use at your own risk.

contact: daniel.molka(at)tu-dresden.de

Prerequirements

  • Linux operating system
  • glibc >= 2.6
  • 64 Bit x86 Processors
    • Supposd to work on Intel and AMD Processors up to Westmere and Magny-Cours, respectively
    • More recent CPUs might need adoption in hardware detection
    • VIA CPUs are currently not supported

Recommended

  • kernel with hugetlbfs support
  • PAPI or PAPI-C to access PMU information
  • Powermanagement invariant Time Stamp Counter (constant rate)
    • Should be available on all server CPUs (Xeon, Opteron) except dual-core Opterons (K8)
    • Desktop and especially mobile CPUs might report odd results
      • Completly disabling powermanagement in the BIOS can help to get useful results
      • TSC synchronization can be forced with BENCHIT_KERNEL_TSC_SYNC option of kernels
    • systemwide synchronous TSC recommended for multiple-* kernels
      • set BENCHIT_KERNEL_TSC_SYNC to "disabled" if not available

Installation

  • Download benchit-snapshot.tar.gz from https://fusionforge.zih.tu-dresden.de/frs/?group_id=885
    • extract files and run <your-BenchIT-inst-dir>/tools/FIRSTTIME to setup BenchIT
      • answer questions carefully as this affects result file naming
      • rerun this if you get result files called "unknown_unknown[...].bit"
    • use information from <your-BenchIT-inst-dir>/tools/hw_detect/cpuinfo to fill out LOCALDEFS/<your_mashine_name>_input_architecture
      • anoying, but can be very useful later on
      • the more information you add the easier it is to interprete results that you do not know which mashine they were created on
  • Download x86_membench.tar.gz from https://fusionforge.zih.tu-dresden.de/frs/?group_id=885
    • Add content of kernel/ and tools/ folders from archive to your BenchIT directory (omit the tools/ folder if you're using a snapshot of BenchIT's development version)
    • Optionally copy example output
      • The examples show how the results of each kernel should look like if executed correctly
      • If your results look totally different don't hesitate to ask for assistance to setup PARAMETERS properly
  • run hardware detection tool to check if your system is supported properly
    • cd <your-BenchIT-inst-dir>/tools/hw_detect
    • sh compile.sh
    • ./cpuinfo
    • if the output reports wrong clock frequency, topology, or odd cache and TLB information the benchmark results are likely to be wrong
    • if results are ok chances are good that the benchmarks will work correctly

Usage

  • DISABLE DYNAMIC FREQUENCY SCALING
    • Use cpufreq governor "performance"
      • echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor (as root)
    • Disable Intel Turbo Boost if the processor has this feature (e.g. Core i7, Xeon 5500 series, Xeon E3/E5/E7)
      • disable in BIOS or force second highest available frequency if Turbo is enabled in BIOS
      • cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies lists frequencies
      • limit /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq if appropriate
    • Disable AMD Turbo Core if the processor has this feature (e.g. Phenom X6, Opteron 6200 series)
  • if possible setup hugepages (e.g. with hugeadm) and mount hugetlbfs
    • >100 2MiB pages per core recommended (reduce BENCHIT_KERNEL_MAX if there are not enough 2MiB pages available)
    • set BENCHIT_KERNEL_HUGEPAGES to "0" if hugetlbfs is not mounted
    • if hugeadm is not available, execute as root:
//create hugepages:
mkdir -p /mnt/huge
echo <num_pages> >/proc/sys/vm/nr_hugepages
mount -t hugetlbfs nodev /mnt/huge
chmod 777 /mnt/huge
//free hugepages:
umount /mnt/huge
echo 0 >/proc/sys/vm/nr_hugepages
  • Important parameters (edit PARAMETERS files in BENCHITROOT/kernel/arch_x86_64/memory_{bandwidth|latency}/C/pthread/{0|SSE2}/<benchmark_name>)
    • BENCHIT_KERNEL_{MIN|MAX|STEPS}
      • configure data set sizes
      • automatically chooses data set sizes suitable for display on a logarithmic scale
      • alternatively use BENCHIT_KERNEL_PROBLEMLIST to select certain dataset sizes
    • BENCHIT_KERNEL_CPU_LIST
      • select cores to run benchmark on
      • it's suggested to run each benchmark with all available cores once and then remove cores with redundant results
    • BENCHIT_KERNEL_USE_MODE
      • chose initial coherency state of data
      • suggested to try all available settings supported by your hardware
      • However, do not use "Owned" on Intel or "Forward" on AMD processors as it will fall back to "Modified" or "Shared", respectively
    • BENCHIT_KERNEL_SHARE_CPU
      • Required for USE_MODE shared, forward and owned
      • should be as far away (max amount of QPI/HT hops) from first selected CPU as possible
      • MUST NOT BE INCLUDED in CPU_LIST
    • BENCHIT_KERNEL_FLUSH_L{1|2|3}
      • enable/disable cache flushes
      • disabled by default, sometimes necessary to clearly distinguish between cache levels
    • BENCHIT_KERNEL_ALLOC
      • select NUMA behaviour
    • BENCHIT_KERNEL_{HUGEPAGES|HUGEPAGE_DIR}
      • setup hugetlbfs usage
      • STRONGLY RECOMMENDED, if you have them use them
      • latency results without using hugepages are basically useless (monotonic increasing memory latency as average pagetable walk penalty rises)
    • BENCHIT_KERNEL_INSTRUCTION
      • bandwidth kernels only
      • select instruction for data transfers (mov,movdqa,movntdq,…)
    • additional single-r1w1 parameters
      • BENCHIT_KERNEL_LAYOUT - alignment of read and write buffers in memory
      • separate USE_MODE for read and write stream
      • BENCHIT_KERNEL_{READ|WRITE}_LOCAL - tie one stream to local memory (producer-consumer behavior)
      • BENCHIT_KERNEL_METHOD - select operation performed on read stream before writing
    • Default values are strongly recommended for the remaining parameters
  • compile and run like any other BenchIT kernel - COMPILE.SH/RUN.SH from command line or via GUI (recommended)
    • Standard comment in the result plots will be the date, this can be changed by editing the comment in the Config tab. If you change this to "<comment>" a summary of the parameters used for this measurement will be displayed
    • use BenchIT's mixer feature to create combined graphs from multiple measurements

Publications