On Sat, May 13, 2000 at 11:21:15AM +0200, Jimmy Lantz wrote: > Complexed perl problem. > > Hi, I have the following problem: > I need to do the following operation on a (huge) data file see sample > below : > Strip the first row of the < & > and print it into a output file > then I have to read the the rest of the data and to be able to > and put it into a hash ??(if that's the best way to go) and associate > the <PRON(pers,sing)> with the value I and <V(montr,pres)> with think > and so forth so that I can analyse the values depending on if it's PRON > or V or something else. Which should be the keys and which should be the values? How are you analyzing the values? > The following: > **[main <ADJ(ge)>]** > is a match and need's to be stripped of the > **[ & ]** and printed to the output file. > > Everything printed (I need to print more on each row but that I can > handle myself) has to be delimited by > || (double pipe) > > Further below is a perlprog that I started to make until I realized that > I needed some help. > > <ICE-GB:S2A-016 #2:1:B> > I <PRON(pers,sing)> think <V(montr,pres)> the <ART(def)> **[main > <ADJ(ge)>]** things <N(com,plu)> that <PRON(rel)> I <PRON(pers,sing)> > saw <V(cxtr,past)> as <PREP(ge)> <,> <PAUSE(short)> as <PREP(ge)> > **[absent <ADJ(ge)>]** from <PREP(ge)> disa <UNTAG> from <PREP(ge)> work > <N(com,sing)> with <PREP(ge)> with <PREP(ge)> disabled <ADJ(edp)> people > <N(com,plu)> was <V(cop,past)> How long will each of these blocks between <ICE-GB... be? It will probably be easier to work on a whole block at once, rather than line by line. > <ICE-GB:S3A-051 #8:1:A> > **[medium <ADJ(ge)>]** speed <N(com,sing)> > > NB! Data filerows above can vary in length (see row 2 & 4) > > #!/usr/bin/perl > $faktor1 = "<ICE-GB:"; > $icedata = "icedata.data"; > > open(FILE, "$icedata"); Don't forget to check the return values of system calls! open(FILE, $icedata) or die "Can't open $icedata: $!\n"; > while(<FILE>) { > $file = $_; > chomp $file; > if ($file =~ /$faktor1/) { > $file =~ s/$faktor1/ /; That part is redundant; you don't need to match twice. if ($file =~ s/$faktor/ /) { > $file =~ s/>/ /; > print "$file\n"; > } > elsif ($file !~ /$faktor1/ ){ That's redundant too; either /$faktor/ matches or it doesn't. } else { > @db_fields = split (/>/, $file); > foreach $field (@db_fields) { print "$field\n"; } > } > else{ > print "No match\n"; This block will never be entered. > } > } > close(FILE); I have some ideas on ways to do this, but I'd like to know the details I asked about before I write any code. Ronald ==== Want to unsubscribe from this list? ==== Send mail with body "unsubscribe" to macperl-forum-request@macperl.org