about summary refs log tree commit diff
path: root/benchtests
Commit message (Collapse)AuthorAgeFilesLines
* benchtests: Make bench-memcmp print jsonSiddhesh Poyarekar2018-02-021-26/+50
| | | | | | | The benchamrk result can now be studied using the compare_strings.py script. * benchtests/bench-memcmp.c: Print json instead of plain text.
* benchtests: Reallocate buffers for every test runSiddhesh Poyarekar2018-02-021-10/+13
| | | | | | | | | Keeping the buffers the same across test runs gives later invocations the advantage since they access cached data. Reallocate so that all test runs are on equal grounds. * benchtests/bench-memcmp.c (do_test): Call realloc_buf for every test run.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-0182-82/+82
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Convert strcmp benchmark output to json formatSiddhesh Poyarekar2017-12-151-27/+51
| | | | The format is now parseable with the compare_strings.py script.
* benchtests: Enable BENCHSET to run subset of testsVictor Rodriguez2017-11-282-20/+68
| | | | | | | | | | | | | | | | | | | | This patch adds BENCHSET variable to benchtests/Makefile in order to provide the capability to run a list of subsets of benchmark tests, ie; make bench BENCHSET="bench-pthread bench-math malloc-thread" This helps users to benchmark specific glibc area ChangeLog: * benchtests/Makefile:Add BENCHSET to allow subsets of benchmarks to be run. * benchtests/README: Add documentation for: Running subsets of benchmarks. Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com> Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com> Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
* benchtests: Expand range of tests names in schema.jsonVictor Rodriguez2017-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | When executing bench-math the benchmark output is invalid with this error msg: Invalid benchmark output: 'workload-spec2006.wrf' does not match any of the regexes: '^[_a-zA-Z0-9]*$¹ or Invalid benchmark output: Additional properties are not allowed ('workload-spec2006.wrf' was unexpected) The error was seen when running the test: workload-spec2006.wrf, 'stack=1024,guard=1' and 'stack=1024,guard=2'. The problem is that the current regex's do not accept the hyphen, dot, equal and comma in the output. This patch changes the regex in benchout.schema.json to accept symbols in benchmark tests names. ChangeLog: * benchtests/scripts/benchout.schema.json: Fix regex to accept a wider range of tests names. Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com> Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
* benchtests: Adjust valid and accepted propertiesVictor Rodriguez2017-11-281-1/+2
| | | | | | | | | | | | | | | | Benchmark workload-spec2006.wrf does not produce max, min or mean results but instead produce throughput. This is represented in benchtests/bench-skeleton.c. This patch adjust benchout.schema.json to consider bench.out from bench-math benchmarks as valid ChangeLog: * benchtests/scripts/benchout.schema.json: Add throughput as accepted result from property and remove "max", min" and "mean" from required properties based on benchtests/bench-skeleton.c. Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com> Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
* benchtests: Bump start size since smaller sizes are noisySiddhesh Poyarekar2017-11-203-3/+3
| | | | | | | | | Numbers for very small sizes (< 128B) are much noisier for non-cached benchmarks like the walk benchmarks, so don't include them. * benchtests/bench-memcpy-walk.c (START_SIZE): Set to 128. * benchtests/bench-memmove-walk.c (START_SIZE): Likewise. * benchtests/bench-memset-walk.c (START_SIZE): Likewise.
* benchtests: Fix walking sizes and directions for *-walk benchmarksSiddhesh Poyarekar2017-11-203-21/+12
| | | | | | | | | | | | | | Make the walking benchmarks walk only backwards since copying both ways is biased in favour of implementations that use non-temporal stores for larger sizes; falkor is one of them. This also fixes up bugs in computation of the result which ended up multiplying the length with the timing result unnecessarily. * benchtests/bench-memcpy-walk.c (do_one_test): Copy only backwards. Fix timing computation. * benchtests/bench-memmove-walk.c (do_one_test): Likewise. * benchtests/bench-memset-walk.c (do_one_test): Walk backwards on memset by N at a time. Fix timing computation.
* Benchtests for sinf, cosf and sincosfRajalakshmi Srinivasaraghavan2017-10-134-1/+10626
| | | | Numbers used from cos and sin inputs.
* benchtests: Memory walking benchmark for memmoveSiddhesh Poyarekar2017-10-052-1/+144
| | | | | | | | | | | | | | | | This benchmark is an attempt to eliminate cache effects from string benchmarks. The benchmark walks both ways through a large memory area and copies different sizes of memory and alignments one at a time instead of looping around in the same memory area. This is a good metric to have alongside the simple memmove benchmark (which is only really useful for smaller sizes) especially for larger sizes where the likelihood of the call being done only once is pretty high. This benchmark is different from memcpy in that it also tests overlapping copies. * benchtests/bench-memmove-walk.c: New file. * benchtests/Makefile (string-benchset): Add it.
* benchtests: Memory walking benchmark for memsetSiddhesh Poyarekar2017-10-052-1/+139
| | | | | | | | | | | | | This benchmark is an attempt to eliminate cache effects from string benchmarks. The benchmark walks backward through a large memory area and sets different sizes of memory and alignments one at a time instead of looping around in the same memory area. This is a good metric to have alongside the simple memset benchmark (which is only really useful for smaller sizes) especially for larger sizes where the likelihood of the call being done only once is pretty high. * benchtests/bench-memset-walk.c: New file. * benchtests/Makefile (string-benchset): Add it.
* benchtests: Memory walking benchmark for memcpySiddhesh Poyarekar2017-10-052-1/+129
| | | | | | | | | | | | | This benchmark is an attempt to eliminate cache effects from string benchmarks. The benchmark walks both ways through a large memory area and copies different sizes of memory and alignments one at a time instead of looping around in the same memory area. This is a good metric to have alongside the other memcpy benchmarks, especially for larger sizes where the likelihood of the call being done only once is pretty high. * benchtests/bench-memcpy-walk.c: New file. * benchtests/Makefile (string-benchset): Add it.
* Add exp2f and log2f benchmark traceSzabolcs Nagy2017-09-203-1/+5277
| | | | | | | | | exp2f and log2f benchmark traces are just copies of the existing expf and logf traces from wrf_r. * benchtests/Makefile: Add exp2f and log2f benchmarks. * benchtests/exp2f-inputs: Copy of expf-inputs. * benchtests/log2f-inputs: Copy of logf-inputs.
* Add logf traceWilco Dijkstra2017-09-192-1/+2889
| | | | | | | | Add a trace for logf. This is a reduced trace based on 2.8 billion samples extracted from wrf_r. * benchtests/Makefile: Add logf benchmark. * benchtests/logf-inputs: Add reduced trace from wrf_r.
* Add expf traceWilco Dijkstra2017-09-192-1/+2389
| | | | | | | | Add a trace for expf. This is a reduced trace based on 2.4 billion samples extracted from wrf_r. * benchtests/Makefile: Add expf benchmark. * benchtests/expf-inputs: Add reduced trace from wrf_r.
* Add benchtests for trunc and truncf.Joseph Myers2017-09-193-1/+46
| | | | | | | | | | | | This patch adds benchtests for the trunc and truncf functions. The inputs listed are fairly arbitrary; I do not assert they are representative of any particular application. * benchtests/Makefile (bench-math): Add trunc and truncf. (CFLAGS-bench-trunc.c): New variable. (CFLAGS-bench-truncf.c): Likewise. * benchtests/trunc-inputs: New file. * benchtests/truncf-inputs: Likewise.
* benchtests: New -g option to generate graphs in compare_strings.pySiddhesh Poyarekar2017-09-161-3/+8
| | | | | | | | | | | The compare_strings.py option unconditionally generates a graph PNG image of the input data, which can be unnecessary and slow. Put this behind an optional flag -g. * benchtests/scripts/compare_strings.py: New option -g. (draw_graph): Print a message that a graph is being generated. (process_results): Generate graph only if -g is passed. (main): Process option -g.
* benchtests: Make compare_strings.py output a bit prettierSiddhesh Poyarekar2017-09-161-9/+11
| | | | | | | | | Make the column widths for the outputs fixed so that they look a little less messy. They will still look bad with lots of IFUNCs (like on x86) but it's still a step forward. * benchtests/scripts/compare_strings.py (process_results): Better spacing for output.
* benchtests: Use argparse to parse argumentsSiddhesh Poyarekar2017-09-162-13/+35
| | | | | | | | | | | Make the script more usable by adding proper command line options along with a way to query the options. The script is capable of doing a bunch of things right now like choosing a base for comparison, choosing to generate graphs, etc. and they should be accessible via command line switches. * benchtests/scripts/compare_strings.py: Use argparse. * benchtests/README: Document existence of compare_strings.py.
* benchtests: Reallocate buffers for memsetSiddhesh Poyarekar2017-09-143-13/+47
| | | | | | | | | | | | Keeping the same buffers along with copying the same size of data into the same location means that the first routine is typically the slowest since it has to bear the cost of fetching data into to cache. Reallocating buffers stabilizes numbers by a bit. * benchtests/bench-string.h (realloc_bufs): New function. (test_init): Call it. * benchtests/bench-memset-large.c (do_test): Likewise. * benchtests/bench-memset.c (do_test): Likewise.
* benchtests: Make memset benchmarks print jsonSiddhesh Poyarekar2017-09-142-31/+80
| | | | | | | | | | Make the memset benchmarks (bench-memset and bench-memset-large) print their output in JSON so that they can be evaluated using the compare_strings.py script. * benchtests/bench-memset-large.c: Print output in JSON format. * benchtests/bench-memset.c: Likewise.
* Add new codepage charmaps/IBM858 [BZ #21084]Mike FABIAN2017-09-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code page is identical to code page 850 except that X'D5' has been changed from LI61 (dotless i) to SC20 (euro symbol). The code points from /x01 to /x1f in the /localedata/charmaps/IBM858 file have the same mapping as those in localedata/charmaps/ANSI_X3.4-1968. That means they disagree with with ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00858.txt in that range. For example, localedata/charmaps/IBM858 and localedata/charmaps/ANSI_X3.4-1968 have: “<U0001> /x01 START OF HEADING (SOH)” whereas CP00858.txt has: “01 SS000000 Smiling Face” That means that CP00858.txt is not really ASCII-compatible and to make it ASCII-compatible we deviate fro CP00858.txt in the code points from /x01 to /x1f. [BZ #21084] * benchtests/strcoll-inputs/filelist#en_US.UTF-8: Add IBM858 and ibm858.c. * iconvdata/Makefile: Add IBM858. * iconvdata/gconv-modules: Add IBM858. * iconvdata/ibm858.c: New file. * iconvdata/tst-tables.sh: Add IBM858 * localedata/charmaps/IBM858: New file.
* benchtests: Do not compile benchmark objects as libc modules [BZ #21864]Florian Weimer2017-08-211-4/+5
| | | | | Otherwise, this will lead to link failures due to hidden symbol references.
* Add math benchmark latency testWilco Dijkstra2017-08-172-10/+37
| | | | | | | | | | | | | | | | | | | | | | This patch further improves math function benchmarking by adding a latency test in addition to throughput. This enables more accurate comparisons of the math functions. The latency test works by creating a dependency on the previous iteration: func_res = F (func_res * zero + input[i]). The multiply by zero avoids changing the input. It reports reciprocal throughput and latency in nanoseconds (depending on the timing header used) and max/min throughput in iterations per second: "workload-spec2006.wrf": { "reciprocal-throughput": 100, "latency": 200, "max-throughput": 1.0e+07, "min-throughput": 5.0e+06 } * benchtests/bench-skeleton.c (main): Add support for latency benchmarking. * benchtests/scripts/bench.py: Add support for latency benchmarking.
* benchtests: Print json in memmove benchmarkSiddhesh Poyarekar2017-08-112-41/+87
| | | | | | | | | | Make the memmove benchmarks (bench-memmove and bench-memmove-large) print their output in JSON so that they can be evaluated using the compare_strings.py script. * benchtests/bench-memmove-large.c: Print output in JSON format. * benchtests/bench-memmove.c: Likewise.
* benchtests: Remove verification runs from benchmark testsSiddhesh Poyarekar2017-08-119-154/+2
| | | | | | | | | | | | | | | The test run is unnecessary and interferes with the benchmark. The tests are done during make check, so they're unnecessary here. * benchtests/bench-memccpy.c (do_one_test): Remove checks. * benchtests/bench-memchr.c (do_one_test): Likewise. * benchtests/bench-memcpy-large.c (do_one_test): Likewise. * benchtests/bench-memcpy.c (do_one_test): Likewise. * benchtests/bench-memmove-large.c (do_one_test): Likewise. * benchtests/bench-memmove.c (do_one_test): Likewise. * benchtests/bench-memset-large.c (do_one_test): Likewise. * benchtests/bench-memset.c (do_one_test): Likewise. * benchtests/bench-string.h (test_init): Remove memsets.
* benchtests: Avoid a display error when running in text terminalSiddhesh Poyarekar2017-08-081-0/+2
| | | | | | | | | | The compare_strings.py script generates a graph for the benchmarks it performs a comparison on and that fails if X is not available. Avoid the error and ensure that only the graph is generated and saved as a PNG file. * benchtests/scripts/compare_strings.py: Avoid display error when generating graph.
* benchtests: Allow selecting baseline for compare_string.pySiddhesh Poyarekar2017-08-081-10/+18
| | | | | | | | | | | | This patch allows one to provide the function name using an optional -base option to compare all other functions against. This is useful when pitching one implementation of a string function against alternatives. In the absence of this option, comparisons are done against the first ifunc in the list. * benchtests/scripts/compare_strings.py (main): Add an optional -base option. (process_results): New argument base_func.
* benchtests: Use TEST_NAME instead of hardcoding memcpySiddhesh Poyarekar2017-08-083-3/+3
| | | | | | | | | | The hardcoded 'memcpy' name turns up in other derived tests like mempcpy. * benchtests/bench-memcpy.c (test_main): Use TEST_NAME instead of hardcoding memcpy. * benchtests/bench-memcpy-large.c (test_name): Likewise. * benchtests/bench-memcpy-random.c (test_name): Likewise.
* benchtests: New script to parse memcpy resultsSiddhesh Poyarekar2017-06-222-0/+173
| | | | | | | | | | | | Read the memcpy results in json and print out the results in tabular form, in addition to generating a graph of the results to compare all of the implementations. The format of the output is extensible enough to allow this kind of analysis to be done on other string functions as well. * benchtests/scripts/benchout_strings.schema.json: New file. * benchtests/scripts/compare_strings.py: New file.
* benchtests: Make memcpy benchmarks print results in jsonSiddhesh Poyarekar2017-06-223-49/+117
| | | | | | | | | | | | | | | | | | Print the benchmark output for various memcpy benchmarks in json so that it can be predictably parsed and analyzed. * benchtests/bench-memcpy-large.c: Include json-lib.h. (do_one_test): Print json. (do_test): Likewise. (test_main): Likewise. * benchtests/bench-memcpy-random.c: Include json-lib.h. (do_one_test): Print json. (do_test): Likewise. (test_main): Likewise. * benchtests/bench-memcpy.c: Include json-lib.h. (do_one_test): Print json. (do_test): Likewise. (test_main): Likewise.
* benchtests: Print string array elements, int and uint in jsonSiddhesh Poyarekar2017-06-222-0/+68
| | | | | | | | | | | | Enhance the json module in benchtests to print signed and unsigned integers and string array elements. * benchtests/json-lib.h: Include inttypes.h. (json_attr_int, json_attr_int, json_element_string, json_element_int, json_element_uint): New functions. * benchtests/json-lib.c: (json_attr_int, json_attr_int, json_element_string, json_element_int, json_element_uint): New functions.
* Add powf traceWilco Dijkstra2017-06-201-0/+2187
| | | | | | | | Add a workload for powf. This is a reduced trace based on 2.3 billion samples extracted from wrf. The distribution of values, in particular frequency of commonly used operands is the same as in the full trace. * benchtests/powf-inputs: Add reduced trace from wrf.
* Improve math benchmark infrastructureWilco Dijkstra2017-06-202-18/+45
| | | | | | | | | | | | | | Improve support for math function benchmarking. This patch adds a feature that allows accurate benchmarking of traces extracted from real workloads. This is done by iterating over all samples rather than repeating each sample many times (which completely ignores branch prediction and cache effects). A trace can be added to existing math function inputs via "## name: workload-<name>", followed by the trace. * benchtests/README: Describe workload feature. * benchtests/bench-skeleton.c (main): Add support for benchmarking traces from workloads.
* Add powf bench testsPaul Clarke2017-06-202-1/+332
| | | | | | | | | | | | | | | | | | | | Add powf() bench test with input which covers these cases: - positive base to positive exponent - exponent 0 - negative base to even exponent - exponent 1 - exponent -1 - squared - squareroot - 1 to negative exponent - -1 to negative exponent - base 0 - -1 to even exponent - small base - small exponent * benchtests/Makefile (bench-math): Add powf. * benchtests/powf-inputs: New file.
* nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)Adhemerval Zanella2017-06-143-1/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current allocate_stack logic for create stacks is to first mmap all the required memory with the desirable memory and then mprotect the guard area with PROT_NONE if required. Although it works as expected, it pessimizes the allocation because it requires the kernel to actually increase commit charge (it counts against the available physical/swap memory available for the system). The only issue is to actually check this change since side-effects are really Linux specific and to actually account them it would require a kernel specific tests to parse the system wide information. On the kernel I checked /proc/self/statm does not show any meaningful difference for vmm and/or rss before and after thread creation. I could only see really meaningful information checking on system wide /proc/meminfo between thread creation: MemFree, MemAvailable, and Committed_AS shows large difference without the patch. I think trying to use these kind of information on a testcase is fragile. The BZ#18988 reports shows that the commit pages are easily seen with mlockall (MCL_FUTURE) (with lock all pages that become mapped in the process) however a more straighfoward testcase shows that pthread_create could be faster using this patch: -- static const int inner_count = 256; static const int outer_count = 128; static void *thread1(void *arg) { return NULL; } static void *sleeper(void *arg) { pthread_t ts[inner_count]; for (int i = 0; i < inner_count; i++) pthread_create (&ts[i], &a, thread1, NULL); for (int i = 0; i < inner_count; i++) pthread_join (ts[i], NULL); return NULL; } int main(void) { pthread_attr_init(&a); pthread_attr_setguardsize(&a, 1<<20); pthread_attr_setstacksize(&a, 1134592); pthread_t ts[outer_count]; for (int i = 0; i < outer_count; i++) pthread_create(&ts[i], &a, sleeper, NULL); for (int i = 0; i < outer_count; i++) pthread_join(ts[i], NULL); assert(r == 0); } return 0; } -- On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests I see: $ time ./test real 0m3.647s user 0m0.080s sys 0m11.836s While with the patch I see: $ time ./test real 0m0.696s user 0m0.040s sys 0m1.152s So I added a pthread_create benchtest (thread_create) which check the thread creation latency. As for the simple benchtests, I saw improvements in thread creation on all architectures I tested the change. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu. [BZ #18988] * benchtests/thread_create-inputs: New file. * benchtests/thread_create-source.c: Likewise. * support/xpthread_attr_setguardsize.c: Likewise. * support/Makefile (libsupport-routines): Add xpthread_attr_setguardsize object. * support/xthread.h: Add xpthread_attr_setguardsize prototype. * benchtests/Makefile (bench-pthread): Add thread_create. * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and then mprotect the required area.
* benchtests: Add more tests for memrchrH.J. Lu2017-06-041-1/+16
| | | | | | | | | | | bench-memchr.c is shared with bench-memrchr.c. This patch adds some tests for positions close to the beginning for memrchr, which are equivalent to positions close to the end for memchr. * benchtests/bench-memchr.c (do_test): Print out both length and position. (test_main): Also test the position close to the beginning for memrchr.
* Rename cppflags-iterator.mk to libof-iterator.mk, remove extra-modules.mk.Zack Weinberg2017-05-092-3/+2
| | | | | | | | | | | | | | | | | | | | | | cppflags-iterator.mk no longer has anything to do with CPPFLAGS; all it does is set libof-$(foo) for a list of files. extra-modules.mk does the same thing, but with a different input variable, and doesn't let the caller control the module. Therefore, this patch gives cppflags-iterator.mk a better name, removes extra-modules.mk, and updates all uses of both. * extra-modules.mk: Delete file. * cppflags-iterator.mk: Rename to ... * libof-iterator.mk: ...this. Adjust comments. * Makerules, extra-lib.mk, benchtests/Makefile, elf/Makefile * elf/rtld-Rules, iconv/Makefile, locale/Makefile, malloc/Makefile * nscd/Makefile, sunrpc/Makefile, sysdeps/s390/Makefile: Use libof-iterator.mk instead of cppflags-iterator.mk or extra-modules.mk. * benchtests/strcoll-inputs/filelist#en_US.UTF-8: Remove extra-modules.mk and cppflags-iterator.mk, add libof-iterator.mk.
* Change TEST_NAME to memcpy to fix IFUNC testing of multiple versions.Steve Ellcey2017-03-281-2/+2
| | | | | * benchtests/bench-memcpy-random.c (TEST_NAME): Change to memcpy. (IMPL) Call with 1 instead of 0 as argument.
* Actually add bench-memcpy-randomSiddhesh Poyarekar2017-03-261-0/+157
| | | | git-add and commit the benchmark that Wilco posted on the list.
* Add a new randomized memcpy test for copies up to 256 bytes. The distributionWilco Dijkstra2017-03-231-1/+1
| | | | | | | | | | of the size and alignment is based on a trace of SPEC2006. Instead of repeating the same copy over and over again like the existing tests, it times several thousand different copies to more accurately estimate the overhead of branch prediction. * benchtests/Makefile (string-benchset): Add memcpy-random. * benchtests/bench-memcpy-random.c: New file.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2017-01-0176-76/+76
|
* Add configure check for python programSiddhesh Poyarekar2016-12-221-2/+8
| | | | | | | | | | | | | | | | | | | Add a configure check that looks for python3 and python in that order since we had agreed in the past to prefer python3 over python in all our code. The patch also adjusts invocations through the various Makefiles to use the set variable. * configure.ac: Check for python3 or python. * configure: Regenerated. * config.make.in (PYTHON): New variable. * benchtests/Makefile: Don't define PYTHON. (bench): Define target only if PYTHON was defined. * Rules: Don't define PYTHON. Define pretty printer targets only if PYTHON was defined. (tests-printers): Add to tests-unsupported if PYTHON is not found. (python-flags, python-invoke): Remove. (tests-printers-out): Use PYTHON instead of python-invoke.
* This patch cleans up the strsep implementation and improves performance.Wilco Dijkstra2016-12-211-21/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently strsep calls strpbrk is is now a veneer to strcspn. Calling strcspn directly is faster. Since it handles a delimiter string of size 1 as a special case, this is not needed in strsep itself. Although this means there is a slightly higher overhead if the delimiter size is 1, all other cases are slightly faster. The overall performance gain is 5-10% on AArch64. The string/bits/string2.h header contains optimizations for constant delimiters of size 1-3. Benchmarking these showed similar performance for size 1 (since in all cases strchr/strchrnul is used), while size 2 and 3 can give up to 2x speedup for small input strings. However if these cases are common it seems much better to add this optimization to strcspn. So move these header optimizations to string-inlines.c. Improve the strsep benchmark so that it actually benchmarks something. The current version contains a delimiter character at every position in the input string, so there is very little work to do, and the extremely inefficent simple_strsep implementation appears fastest in every case. The new version has either no match in the input for the fail case and a match halfway in the input for the success case. The input is then restored so that each iteration does exactly the same amount of work. Reduce the number of testcases since simple_strsep takes a lot of time now. * benchtests/bench-strsep.c (oldstrsep): Add old implementation. (do_one_test) Restore original string so iteration works. * string/string-inlines.c (do_test): Create better input strings. (test_main) Reduce number of testruns. * string/string-inlines.c (__old_strsep_1c): New function. (__old_strsep_2c): Likewise. (__old_strsep_3c): Likewise. * string/strsep.c (__strsep): Remove case of small delim string. Call strcspn directly rather than strpbrk. * string/bits/string2.h (__strsep): Remove define. (__strsep_1c): Remove. (__strsep_2c): Remove. (__strsep_3c): Remove. (strsep): Remove. * sysdeps/unix/sysv/linux/internal_statvfs.c (__statvfs_getflags): Rename to __strsep.
* benchtests: Add fmaxf/fminf benchmarksAdhemerval Zanella2016-12-193-1/+50
| | | | | | | | | | | | | | | | | | | | | | | This patch adds fmaxf and fminf benchtests. It is based on math/s_fmax_template.c implementation which checks for basically four different classes: 1. if x is greater or equal than y. 2. if x is less than y. 3. if x or y is signaling. 4. if y is nan. Cases 1 and 2 are used for default input number (by mixing normal double numbers and infinity), while case 3 and 4 are used each for on for a benchmark class. Checked on x86_64-linux-gnu and powerpc64-linux-gnu. * benchtests/Makefile (bench-math): Add fminf and fmaxf. (CFLAGS-bench-fmaxf.c): New rule. (CFLAGS-bench-fminf.c): Likewise. * benchtests/fmaxf-inputs: New file. * benchtests/fminf-inputs: Likewise.
* benchtests: Add fmax/fmin benchmarksAdhemerval Zanella2016-12-193-1/+49
| | | | | | | | | | | | | | | | | | | | | | This patch adds fmax and fmin benchtests. It is based math/s_fmax_template.c implementation which checks for basically four different classes: 1. if x is greater or equal than y. 2. if x is less than y. 3. if x or y is signaling. 4. if y is nan. Cases 1 and 2 are used for default input number (by mixing normal double numbers and infinity), while case 3 and 4 are used each for on for a benchmark class. Checked on x86_64-linux-gnu and powerpc64-linux-gnu. * benchtests/Makefile (bench-math): Add fmin and fmax. (CFLAGS-bench-fmax.c): New rule. (CFLAGS-bench-fmin.c): New rule. * benchtests/fmax-inputs: New file. * benchtests/fmin-inputs: Likewise.
* Adjust benchtests to new support library.Adhemerval Zanella2016-12-1930-50/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch basically replaces the test-skeleton.c inclusion by support/test-driver.c and also minor adjustments in bench-string.h. Checked on x86_64-linux-gnu and powerpc64le-linux-gnu. * benchtests/bench-string.h (TEST_FUNCTION): Use name without parenthesis. (CMDLINE_PROCESS): Define using function instead of macro. * benchtests/bench-memccpy.c: Include <support/test-driver.c> instead of test-skeleton. * benchtests/bench-memchr.c: Likewise. * benchtests/bench-memcmp.c: Likewise. * benchtests/bench-memcpy-large.c: Likewise. * benchtests/bench-memcpy.c: Likewise. * benchtests/bench-memmem.c: Likewise. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-memmove.c: Likewise. * benchtests/bench-memset-large.c: Likewise. * benchtests/bench-memset.c: Likewise. * benchtests/bench-rawmemchr.c: Likewise. * benchtests/bench-strcasecmp.c: Likewise. * benchtests/bench-strcasestr.c: Likewise. * benchtests/bench-strcat.c: Likewise. * benchtests/bench-strchr.c: Likewise. * benchtests/bench-strcmp.c: Likewise. * benchtests/bench-strcpy.c: Likewise. * benchtests/bench-strcpy_chk.c: Likewise. * benchtests/bench-strlen.c: Likewise. * benchtests/bench-strncasecmp.c: Likewise. * benchtests/bench-strncmp.c: Likewise. * benchtests/bench-strncpy.c: Likewise. * benchtests/bench-strnlen.c: Likewise. * benchtests/bench-strpbrk.c: Likewise. * benchtests/bench-strrchr.c: Likewise. * benchtests/bench-strsep.c: Likewise. * benchtests/bench-strspn.c: Likewise. * benchtests/bench-strstr.c: Likewise. * benchtests/bench-strtok.c: Likewise.
* Link benchset tests against libsupportSiddhesh Poyarekar2016-12-181-0/+1
| | | | | | | | Benchsets in benchtests use test-skeleton, so they too need to be linked against the new libsupport DSO. * benchtests/Makefile (binaries-benchset): Depend on libsupport DSO.
* Improve strtok and strtok_r performance. Instead of calling strpbrk whichWilco Dijkstra2016-12-141-3/+31
| | | | | | | | | | | | | | | | | | calls strcspn, call strcspn directly so we get the end of the token without an extra call to rawmemchr. Also avoid an unnecessary call to strcspn after the last token by adding an early exit for an empty string. Change strtok to tailcall strtok_r to avoid unnecessary code duplication. Remove the special header optimization for strtok_r of a 1-character constant string - both strspn and strcspn contain optimizations for this case. Benchmarking this showed similar performance in the worst case, but up to 5.5x better performance in the "found" case for large inputs. * benchtests/bench-strtok.c (oldstrtok): Add old implementation. * string/strtok.c (strtok): Change to tailcall __strtok_r. * string/strtok_r.c (__strtok_r): Optimize for performance. * string/string-inlines.c (__old_strtok_r_1c): New function. * string/bits/string2.h (__strtok_r): Move to string-inlines.c.