Find ERE

Dowload from http://www.nyangau.org/findere/findere.zip.

This program is a variation on the UNIX egrep program.

It is also a simple test harness for my Extended Regular Expression, and Directory Traversal libraries. As these have been implemented on a variety of operating systems, FINDERE is available on a similar wide variety of operating systems.

Usage

usage: findere [-n ere|-N glob] [-x ere|-X glob] [-y ere|-Y glob] [-r]
               [-j|-J] [-i] [-m] [-s] [-v] [-c]
               [f|-F] [-l] [-S] [-a|-A] [-q] [--] ere {fn}
flags: -n ere   search files whose base filenames match ere
       -N glob  search files whose base filenames match UNIX style glob
       -x ere   exclude files whose basenames match ere
       -X glob  exclude files whose basenames match UNIX style glob
       -y ere   exclude directories whose basenames match ere
       -Y glob  exclude directories whose basenames match UNIX style glob
       -r       recurse into directories
       -j -J    case insensitive pathname match, or sensitive
                (default: sensitive on UNIX)
       -i       case insensitive search
       -m       don't look for multiple matches on each line
       -s       look for shortest match (default is longest)
       -v       look for lines which don't match the ere
       -c       display count of matches, not matches themselves
       -f -F    display filenames with matches or counts, or don't
                (default: show if more than one filename given, or -r)
       -l       display line number before each match
       -S       show submatches (implies no -c, no -v and -m)
       -a -A    highlight output in colour, or don't
                (default: use ANSI, if tty and recognised terminal type)
       -q       don't show some error messages
       ere      extended regular expression
       fn       filename(s), no extension assumed (- for stdin)

The arguments can be broken down into 3 main categories, as detailed below.

Where to search

The fn argument(s) identifies the files and or directories to scan.

If -r is specified, any directories are recursively searched. This option is only available on platforms where DIRT has been implemented.

The -n ere or -N glob can be used to filter filenames to just those that match the pattern.

The -x ere or -X glob can be used to avoid seatching files matching a pattern. eg: -X "*.bak".

The -y ere or -Y glob can be used to avoid seatching directories matching a pattern. eg: -Y ".svn".

-j or -J can be used to force the pattern matching to be case sensitive or not. The default is the most sensible choice for the platform, but it is acknowledged that systems can sometimes 'mount' each others disks.

See the Gotchas section for important clarifications on wildcards and how the recursive searching works.

How to match

-i can make the match case insensitive.

By default FINDERE finds all the matches on a given line. This may seem redundant as a line with one match or more is displayed. However, FINDERE can output its matches in colour and highlight each match. -m can disable the multiple matches per line feature.

By default FINDERE finds the longest matches. Again, this may seem redundant, unless you consider that the results can be shown highlighted in colour. -s can be used to make it stop at the shortest match.

Normally a line is said to match if it matches the extended regular expression, but if -v is specified, then a line matches if it doesn't match the extended regular expression.

The -c argument can be used to cause FINDERE to display the count of matching lines, rather than the matches themselves.

The Extended Regular Expressions supported are those supported by the ERE module, which is obtainable from the same place FINDERE is obtained.

Displaying the matches

FINDERE can display the filename prior to each matching line. If there is only one file being matched, it doesn't do this by default. However, -f can be used to force it to do this, and -F can force it not to.

If -l is passed, FINDERE displays the line number as well.

As well as showing the matching line, FINDERE can show submatches also. Submatches are delimited by ( and ) in extended regular expressions. eg:

line to match              : goodbye cruel world
extended regular expression: ([a-z]+) ([a-z]+) world
submatch 1 will be         : goodbye
submatch 2 will be         : cruel

FINDERE tries to determine whether the output is going to the screen (rather than a pipe or a file) and whether that screen can support colour output (using ANSI or Win32 Console API). If so, it will output matches highlighted in colour. This is at best a heuristic, so you can force FINDERE to use colour using -a, or not to using -A.

As it runs FINDERE may enounter various errors (such as "cannot open file"). It can be made to silently ignore some of these using the -q argument.

Examples

Find typedef in all C source and header files :-

findere typedef *.c *.h

The ERE aware reader might note that \<typedef\> is a better expression, as it matches the exact word typedef, and rejects where it is a part of larger words like typedefinition.

Find typedef in all C source and header files (alternative syntax, using a glob). Internally FINDERE converts the glob to the equivelent ERE :-

findere -N "*.[ch]" typedef *

Again, this time using an ERE to match filenames too. Note that ^ and $ anchors are implied, and that using EREs there is the scope to perform much more sophisticated matching :-

findere -n ".*\.[ch]" typedef *

Look for implements in all Java source files in this directory and below :-

findere -N "*.java" -r implements *

Look for andy, case insensitve match :-

findere -i andy letter.txt

Find all the lines which aren't single line comments :-

findere -v "^#" /etc/hosts

Find passwords in various XML files. Note that we always want the filename displayed, even if there is only the one XML file present :-

findere -f "<password>[^<]+</password>" *.xml

Find password and capture it as a submatch :-

findere -f -S "<password>([^<]+)</password>" *.xml

Gotchas

Wildcard confusion

There are several kinds of wildcard in use here :-

Operating System native filename wildcard.
As used for the fn arguments. Varies depending upon the platform you're running on. These can be UNIX globs, or non-UNIX wildcards.
UNIX globs.
As used for the -N glob argument (regardless of where you are running FINDERE), and for the fn arguments (if you're running FINDERE on UNIX). These support ? to mean any character, * to mean zero or more of any character, and [ ] enclosed character sets/ranges.
non-UNIX wildcards.
As used for the fn arguments if running non-UNIX. A subset of UNIX globs. Typically supporting ? to mean any character, * to mean zero or more of any character. On 32 bit OS/2 wildcard expansions only include files, and do not include directories. On DOS, * doesn't work the middle of the wildcard either. On 32 bit DOS, I've not been able to get any wildcard to work.
ERE
As used for the -n ere argument. Full Extended Regular Expression syntax supported.

You need to be sure you're using the right kind of pattern in the right place, otherwise unexpected results will be obtained.

fn argument

The following does not search all C files in this directory and its subdirectories :-

findere -r struct *.c

In fact, it searches all C files in this directory, and all files in any subdirectory that happens to have a name ending in .c! It may not even do this (see section above).

Just as in UNIX, using find, you'd say :-

find . -name '*.c' -exec egrep struct {} \;

When using FINDERE you can say :-

findere -N "*.c" -r struct .

Quoting

There are so many metacharacters involved here, that you're just bound to fall foul of the quoting requirements of your shell or command processor.

Copying

This program, including its source code, are public domain. Caveat Emptor.


This documentation is written by the FINDERE program author, Andy Key
andy.z.key@googlemail.com