[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] Finding duplicated code

To: fwp@technofile.org
Subject: [FWP] Finding duplicated code
From: Peter Scott <Peter@PSDT.com>
Date: Thu, 06 Jan 2000 17:21:49 -0800

I had a half-baked idea for searching for duplicated perl code in large 
libraries (an entirely plausible situation in one of my projects).  Of 
course, one can shoot all kinds of holes in this implementation, but it 
would work fairly well with the kind of code I have, given that a 
reasonably intelligent person would be perusing the output for sanity.

Step 1: With the whole file in a string, eliminate comments and normalize 
white space:

	s/((["'`])(?:\\.|[^"\\])*\2)|#.*?\n/$1||''/eg;
	tr/ \t\r\f\n/ /s;

Step 2: Normalize variable names:

	s/(?<=[\@\$%])\w+|\b[A-Z]+\b/VARIABLE/g;

Step 3: Look for duplicated strings longer than, say, 150 characters:

	print "---\n$1\n---\n" while /(.{150,}).*?\1/g;

One minuscule problem: the matches printed are on the normalized string, 
which is not as useful as the input string.  So I would like to map 
backwards from a match in the normalized string to the input string.  None 
of the ways I can think of doing this make it nearly worth the effort, 
though :-)

--
Peter Scott
Pacific Systems Design Technologies


==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe

Follow-Ups:
- Re: [FWP] Finding duplicated code
  - From: sthoenna@efn.org (Yitzchak Scott-Thoennes)

Prev by Date: Re: [FWP] CPAN _Rules_!
Next by Date: Re: [FWP] CPAN _Rules_!
Prev by thread: Re: [FWP] Regex to remove whitespace and to capitalize
Next by thread: Re: [FWP] Finding duplicated code
Navigation: Date Index | Thread Index | Search | Other lists at bumppo.net