-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks everybody for your proposed solutions. I've tested three: two from Dirk Myers and another by Todd Larason.=20 Other solutions didn't work because I didn't explain the problem clearly enough. In fact, I shouldn't have mentioned that $s1 =3D $pat . $s2; because that's not always true. What *is* true is that $s1 always *contains* $pat and $s2. Because of that I had to modify Todd Larason's answer, and Michael Budash's solution is rendered invalid. Sorry. All three are much much better than my version, and very fast. I've tested them with my $txt =3D "estabamos alla, y habia un monton de gente y no " . "cabiamos, asi que nos fuimos a un cabaret; luego " . "un cabo de la guardia civil nos detuvo y nos hizo " . "trabajar."; my $pat =3D "ab"; my $s2 =3D "*1"; my $s1 =3D $pat . " ($s2)"; and also with real data. I've changed Dirk Myer's first version (the one posted on FWP) to switch the order of the data in the array. I also modified all versions to quotemeta the search part of the regexp because it can contain regexp metacharacters (parentheses and dots are common). The results are: >Benchmark: timing 250000 iterations of Dirk Myers (I) , Dirk Myers=20 >(II), Todd Larason ... >Dirk Myers (I) : 28 wallclock secs (26.47 usr + 0.01 sys =3D 26.48 CPU) >Dirk Myers (II): 29 wallclock secs (27.52 usr + 0.00 sys =3D 27.52 CPU) >Todd Larason : 27 wallclock secs (26.01 usr + 0.00 sys =3D 26.01 CPU) Responses to individual messages follow: On Sun, 20 Jun 1999 12:49:22 -0700, Michael Budash wrote: >i'll give it a shot, tho it's hard without an example of the "big >string"... The big strings are sentences (jurisprudence from law courts) and we are changing appearances of organization's names (let's say "Tribunal de Cuentas del Estado") to its acronyms (TCE), but the first time we have to substitute it for the complete name *plus* the acronym. On Sun, 20 Jun 1999 13:12:57 -0700, Todd Larason wrote: >$count =3D 0; >$text =3D~ s/$pat/$count++ =3D=3D 0 ? $pat . $s2 : $s2/eg; > >would do it more directly, no? Yes, a lot more. I now realize I was being dense :) >If you know that $s2 won't be in the $text until it's put there by >this, >$text =3D~ s/$pat/$s2/g; >$text =3D~ s/$s2/$pat$s2/; Alas, we don't know that. On Sun, 20 Jun 1999 15:56:49 -0700, Larry Rosler wrote: >$test =3D~ s/$pat(.*?)$pat/$s1$1$s2/; Thanks for your input. Unfortunately, I didn't make myself clear enough, sorry. As Patrik Grip-Jansson has suggested, the intent was to substitute the $pat several times after the first. On Sun, 20 Jun 1999 17:35:53 -0700 (PDT), Dirk Myers wrote: >it's not much simpler than the original, but it's another way to do it Well, perhaps no simpler, but sure a lot faster :) >... this could be further simplified by switching $pat1 and $pat2 in >the array and using (!$i++) for the subscript... I'm not entirely >convinced it would be less obfuscated that way, either... Yes, it is a bit more obfuscated, but not so obfuscated to be a concern. What we are using is a very short script (less than 500 lines, and most than half of them are static initialization of hashes) that should process in the order of hundreds of thousands of sentences ranging from a few KB to 2 or 3 MB, so speed is more important than a little un-intuitiveness :) /L/e/k/t/u -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 6.0.2i iQA/AwUBN240PP4C0a0jUw5YEQI0iwCfZ6sBsC1GPnnJpV50B/pvTrN+PrMAmwWP tz6DlyZzU1UjGn5NCArc5+FM =3DKTsI -----END PGP SIGNATURE----- =3D=3D=3D=3D Want to unsubscribe from this list? (Don't you love us anymore= ?) =3D=3D=3D=3D Well, if you insist... Send mail with body "unsubscribe" to =3D=3D=3D=3D fwp-request@technofile.org