[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [FWP] Grepping only text files



On Tue, Jun 29, 1999 at 11:33:35AM -0700, Bernie Cosell wrote:
> >>From vlb@cfcl.com  Tue Jun 29 09:29:13 1999 - sorry for the delay]
> Message-Id: <199906291619.MAA03774@mercury.rev.net>
> Organization: Roanoke Electronic Village
> 
> I can't exactly figure out why [Unix] grep has never had a command
> line switch [if not the default!] only to scan text files.  I use the
> following:
> 
> exec ("grep" , grep  { not -B ; }  @ARGV) ;

Unlike DumbOSes, Unix has no distinction between text and binary files
("binary" files often contain text.  This is what strings(1) is for.)
Figuring out this information can really only be done heuristicly.
This heuristic is best done by the file(1) command.  Perl's -T makes a
good stab at the problem, though.

Unfortunately, file(1) returns its information in a less than useful form:

$ file *
FWP:        English text
FWP.bin:    BinHex binary text, version 4.0
Resume.doc: data
Resume.txt: International language text
foo.tar.gz: gzip compressed data, deflated, last modified: Thu Jun 10 20:48:49 1999, os: Unix
phonetic:   directory
send.txt:   ASCII text
summarize:  perl script text

It is non-trivial to extract the filenames of text files from this
(this can probably be done better):

file * | grep '\btext\b' | perl -ne 'print m/^([^:]+):/, "\n"' | xargs grep <pattern>

It would be nice of file(1) had a -t flag, that just returned
filenames of text files.

Anyhow, so the moral of the story is use -T.

-- 

Michael G Schwern                                           schwern@pobox.com
                    http://www.pobox.com/~schwern
     /(?:(?:(1)[.-]?)?\(?(\d{3})\)?[.-]?)?(\d{3})[.-]?(\d{4})(x\d+)?/i

==== Want to unsubscribe from Fun With Perl?
==== Well, if you insist... Send mail with body "unsubscribe" to
==== fwp-request@technofile.org