[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

[FWP] Finding duplicated code



I had a half-baked idea for searching for duplicated perl code in large 
libraries (an entirely plausible situation in one of my projects).  Of 
course, one can shoot all kinds of holes in this implementation, but it 
would work fairly well with the kind of code I have, given that a 
reasonably intelligent person would be perusing the output for sanity.

Step 1: With the whole file in a string, eliminate comments and normalize 
white space:

	s/((["'`])(?:\\.|[^"\\])*\2)|#.*?\n/$1||''/eg;
	tr/ \t\r\f\n/ /s;

Step 2: Normalize variable names:

	s/(?<=[\@\$%])\w+|\b[A-Z]+\b/VARIABLE/g;

Step 3: Look for duplicated strings longer than, say, 150 characters:

	print "---\n$1\n---\n" while /(.{150,}).*?\1/g;

One minuscule problem: the matches printed are on the normalized string, 
which is not as useful as the input string.  So I would like to map 
backwards from a match in the normalized string to the input string.  None 
of the ways I can think of doing this make it nearly worth the effort, 
though :-)

--
Peter Scott
Pacific Systems Design Technologies


==== Want to unsubscribe from Fun With Perl?  Well, if you insist...
==== Send email to <fwp-request@technofile.org> with message _body_
====   unsubscribe