java – Where can I find unit tests for regular expressions in multiple languages?-ThrowExceptions

Exception or error:

I’m building a regex helper at http://www.debuggex.com. The amount of detail I want to show requires me to write my own parser and matcher.

To make sure my parser and matcher work correctly, I’ve written my own unit tests for the Javascript flavor of regexes, but these only cover edge cases I know about. I would like to use a standard test suite, and was recently pointed to http://hg.ecmascript.org/tests/test262/summary, which I will be using.

My question is, where can I find such test suites for other regex flavors? I’d like to support other flavors in the future. I have not been able to find anything by googling (“test” pollutes the results with regex testers). I am looking for test suites for the languages python, php, perl, java, ruby, and .net.

How to solve:

Most of those languages are open source. Any decent project should have their test cases in said repo, otherwise I would be pretty concerned.

  • Python‘s regex tests
  • PHP‘s regex tests
  • Perl‘s regex tests looks really extensive
  • Open JDK‘s unit tests (an open source flavour of Java)
  • Ruby‘s regex tests
  • Mono‘s regex tests (open source version of .NET)
  • .NET Core‘s regex tests
  • RE2‘s tests (C++ regex engine developed at Google)
  • C test suite (developed by AT&T Research)
  • PCRE regex tests (Perl Compatible Regular Expressions C library)
  • JavaScript regex tests (Ecma Technical Committee 39 compatability suite)

I also found an extensive chart on this page which might be of some help to you.

###

To have a complete list on one page, I’ve found the ones omitted from the accepted answer:

  • Mono’s regex tests (it’s an open source version of .net)
  • PHP’s regex tests

###

Additional regex test suites for additional languages:

Bonus

  • Regfuzz (C toolkit for testing regular expression robustness using randomly generated and invalid regexes)

Leave a Reply

Your email address will not be published. Required fields are marked *