Path: arizona!noao!ncar!unmvax!tut.cis.ohio-state.edu!rutgers!apple!oliveb!amdahl!rtech!gonzo!daveb From: daveb@gonzo.UUCP (Dave Brower) Newsgroups: comp.lang.c Subject: Re: Want a way to strip comments from a Summary: I feel a contest coming on Message-ID: <620@gonzo.UUCP> Date: 24 Mar 89 03:11:44 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <16078@cup.portal.com> <16492@mimsy.UUCP> Reply-To: daveb@gonzo.UUCP (Dave Brower) Distribution: na Organization: Gonzo Media Group Lines: 25 In article <16492@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes: >>When scanning the result of preprocessing a nontrivial C program with >>many include files, one finds dozens (in some cases hundreds) of blank >>lines. ... Why not eliminate them and issue a #line instead? > >Why bother? Typically there are at most a few tens in a row. It is >probably faster to count 20 blank lines than to process one >`#line 1234' directive. Yup, true enough for compilation. It is sort of annoying tough when you need to look at the intermediate file to figure something out. So, I offer this week's challenge: Smallest program that will take "blank line" style cpp output on stdin and send to stdout a scrunched version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, perl, c, c++ are all acceptable. This will be an amusing excercise in typical text massaging that can be enlightening for many people. Is this branching out of comp.lang.c? Where should it go? -dB -- "I came here for an argument." "Oh. This is getting hit on the head" {sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb daveb@gonzo.uucp >From peckham@svax.cs.cornell.edu Fri Mar 24 10:03:21 1989 Path: arizona!noao!ncar!mailrus!cornell!peckham From: peckham@svax.cs.cornell.edu (Stephen Peckham) Newsgroups: comp.lang.c Subject: Scrunch blank lines Message-ID: <26389@cornell.UUCP> Date: 24 Mar 89 17:03:21 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <16078@cup.portal.com> <16492@mimsy.UUCP> <620@gonzo.UUCP> Sender: nobody@cornell.UUCP Reply-To: peckham@svax.cs.cornell.edu (Stephen Peckham) Distribution: na Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 22 In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: >So, I offer this week's challenge: Smallest program that will take >"blank line" style cpp output on stdin and send to stdout a scrunched >version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, >perl, c, c++ are all acceptable. This will be an amusing excercise in >typical text massaging that can be enlightening for many people. > Here's an awk program that will do the trick. Single blank lines are left as is. Multiple blank lines are removed, and a new line directive is added. {if (NF == 0) blanks++ else if ($1=="#") {l_no = $2-1; f = $3; blanks = 2;} else { if (blanks > 1) print "#", l_no, f; else if (blanks == 1) print ""; blanks = 0; print $0; } l_no++; } Steve Peckham Path: arizona!rupley From: rupley@arizona.edu (John Rupley) Newsgroups: comp.lang.c Subject: Re: Want a way to strip comments from a Summary: is this what may be wanted? Message-ID: <9887@megaron.arizona.edu> Date: 26 Mar 89 02:36:13 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <620@gonzo.UUCP> Distribution: na Organization: U of Arizona CS Dept, Tucson Lines: 76 > In article <620@gonzo.UUCP>, daveb@gonzo.UUCP (Dave Brower) writes: > So, I offer this week's challenge: Smallest program that will take > "blank line" style cpp output on stdin and send to stdout a scrunched > version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, > perl, c, c++ are all acceptable. This will be an amusing excercise in > typical text massaging that can be enlightening for many people. "Scrunching" is probably a matter of taste, with regard to the format of the ouput. So I am not sure what you, yourself, want. But below is a guess. Lex, of course. May not be portable, but it should work with minor mods on other Unices. Should be easy to modify for different output format. John Rupley rupley!local@megaron.arizona.edu %{ /*---------------------------start of text---------------------------*/ /*- * SCRUNCH.l * * Scrunch cpp output. * In-Reply-To: daveb@gonzo.UUCP (Dave Brower) * Message-ID: <620@gonzo.UUCP> #comp.lang.c * * Compress runs of "#" lines and blank lines, or runs of two or more * blank lines: * (\n*# lineno "file"\n+)* or \n\n\n+ * into a single line: * #line lineno "file"\n * which is output before the next line of program text * (corresponding to line "lineno" of the source "file"). * The values of "lineno" and "file" are adjusted for changes in * source resulting from #include statements. * Lines with whitespace are not considered blank and are passed. * * Compilation: * lex scrunch.l * cc -O lex.yy.c -ll -o scrunch * * Minimally tested with UNIX sys5r2 cpp only, as follows: * (a) /lib/cpp -Dprocessor=1 lex.yy.c >scruch.cpp #specify your processor * scrunch scrunch.cpp.c * cc -O scrunch.cpp.c -ll * cmp -l a.out scrunch #should give date/name diffs only * (b) compare line numbers in scrunch.cpp.c with lex.yy.c and scrunch.cpp * (no differences stood out) * * Possible bugs: * escaped newlines in macros. * ???? * * John Rupley * rupley!local@megaron.arizona.edu */ %} char file[BUFSIZ]; POUND #[ ]+[0-9]+[ ]+\".*$ TEXT [^#\n].*$ %START POUND TEXT %% . {unput(yytext[0]); BEGIN TEXT;} {POUND} sscanf(yytext, "# %d %s", &yylineno, &file[0]); {TEXT} {printf("#line %d %s\n", yylineno-1, file); ECHO; BEGIN TEXT;} \n ; {POUND} {sscanf(yytext, "# %d %s", &yylineno, &file[0]); BEGIN POUND;} \n{3,} {printf("\n"); BEGIN POUND;} {TEXT}|\n ECHO; . printf("\nERROR: file %s, line %d, char 0x%x=%c\n", file, yylineno, (unsigned int) yytext[0], yytext[0]); %% /*----------------------------end of text-------------------------------*/ From daveb@gonzo.UUCP Tue Mar 28 10:14:59 1989 Path: arizona!noao!ncar!ames!pacbell!rtech!gonzo!daveb From: daveb@gonzo.UUCP (Dave Brower) Newsgroups: comp.lang.c Subject: Re: Scrunch blank lines Summary: BZZT. Wrong answer Message-ID: <623@gonzo.UUCP> Date: 28 Mar 89 17:14:59 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> <6839@cg-atla.UUCP> Reply-To: daveb@gonzo.UUCP (Dave Brower) Distribution: na Organization: Gonzo Media Group Lines: 27 In article <6839@cg-atla.UUCP> duane@cg-atla.UUCP (Andrew Duane) writes: >> In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: >>So, I offer this week's challenge: Smallest program that will take >>"blank line" style cpp output on stdin and send to stdout a scrunched >>version with appropriate #line directives. [f]lex, Yacc, [na]awk, sed, >>perl, c, c++ are all acceptable. > >If shell scripts are acceptable, how about: > > #!/bin/sh > cat -s > >You may have to use "more" rather than cat. The moral: please >don't reinvent the wheel [1/2 ;-)] Sorry, you lept at the naive and incorrect solution. Please say "with appropriate #line directives." Cat -s obfuscates matching the output lines with the input lines. That is the point of the challenge. I have two entries so far, one in "lex" and another in "awk". Both are less than 20 lines. It will be interesting to compare timings between awk, gawk, nawk, lex and flex. -dB -- "I came here for an argument." "Oh. This is getting hit on the head" {sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb daveb@gonzo.uucp Path: arizona!noao!ncar!boulder!sunybcs!rutgers!njin!princeton!phoenix!bernsten From: bernsten@phoenix.Princeton.EDU (Dan Bernstein) Newsgroups: comp.lang.c Subject: Re: Scrunch blank lines Summary: sed Message-ID: <7472@phoenix.Princeton.EDU> Date: 29 Mar 89 21:56:36 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> <6839@cg-atla.UUCP> <623@gonzo.UUCP> Reply-To: bernsten@phoenix.Princeton.EDU (Dan Bernstein) Distribution: na Organization: Princeton U. Undergrad Math Majors, last time I checked Lines: 35 Dave Brower asks for a filter ``that will take "blank line" style cpp output on stdin and send to stdout a scrunched version with appropriate #line directives.'' If we may combine built-in utilities to handle the problem, then this 9-line shell script will do it (combine the last two lines to make it 8): #!/bin/sh ( tr XY '\375\376' | sed 's/^\(.\)\(.*\)/X\1\2Y/ tend i\ X#line d :end =' | uniq | tr '\012X' ' \012'; echo ''; ) | sed 's/Y.*//' | tr '\375\376' XY | sed -n '1!p' The idea is reasonably simple; one could use, e.g., grep -n '.' to obtain a similar solution. This particular version destroys any \375 and \376 you may have in your source, and because it's based on sed, it omits the final line if it has no newline. It has been tested successfully on a wide variety of sources, and I must say the next time I feel compelled to look at cpp output, I'll definitely use it. > I have two entries so far, one in "lex" and another in "awk". Both are > less than 20 lines. It will be interesting to compare timings between > awk, gawk, nawk, lex and flex. Ahem? Are we forgetting sed here? (Then again, I hate awk, love sed, and prefer C to lex. I'd rather have a sed script twice as slow as an awk script. But that's just personal bias.) If you time, make sure to test out on really long sources too. I'd hate to see my script penalized just because it totals eight+sh execs :-). ---Dan Bernstein, bernsten@phoenix.princeton.edu Path: arizona!noao!ncar!unmvax!tut.cis.ohio-state.edu!ukma!husc6!ogccse!littlei!omepd!merlyn From: merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge) Newsgroups: comp.lang.c Subject: FIXED Perl line-scruncher solution (was Re: Scrunch blank lines) Summary: arrrgh Keywords: messedup Message-ID: <4260@omepd.UUCP> Date: 30 Mar 89 21:59:43 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> <6839@cg-atla.UUCP> <623@gonzo.UUCP> <4257@omepd.UUCP> Sender: news@omepd.UUCP Reply-To: merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge) Distribution: na Organization: Stonehenge; netaccess via BiiN, Hillsboro, Oregon, USA Lines: 39 Posted: Thu Mar 30 13:59:43 1989 In-reply-to: myself :-) In article <4257@omepd.UUCP>, a rather dingy merlyn@intelob writes: | Here's my solution, just 5 lines of (mostly readable :-) Perl... | | #!/usr/bin/perl | for ($file = "stdin", $line = 1; $_ = ;) { | ($line = $1, $file = $2, $sync = 0, next) if /^#\W*(\d+)\W+"(.*)"\W*$/; | ($sync = 0, $line++, next) if /^\W*$/; | printf "#line %d \"%s\"\n", $line, $file unless $sync++; | print; $line++; | } Arrrgh. Someone should beat into my head that non-word != whitespace. All the '\W' up there should be '\s'. Anyway, for those of you that want the "new and improved" version, try this (I call it "seepp"): ---------------------------------------- cut here -------------------- #!/usr/bin/perl open(CPP,$a = "/lib/cpp " . join(" ",@ARGV) . "|") || die "Cannot exec '$a' ($!)\n"; for ($file = "/dev/null", $line = 1; $_ = ;) { ($file = $1, $line = $2, $sync = 0, next) if /^#\s*"(.*)"\s+(\d+)\s*$/; ($file = $1, $line++, $sync = 0, next) if /^#\s*"(.*)"\s*$/; ($line = $1, $file = $2, $sync = 0, next) if /^#\s*(\d+)\s+"(.*)"\s*$/; ($line = $1, $sync = 0, next) if /^#\s*(\d+)\s*$/; ($sync = 0, $line++, next) if /^\s*$/; printf "#line %d \"%s\"\n", $line, $file unless $sync++; print; $line++; } close(CPP) || die "Cannot close '$a' ??? ($!)\n"; exit(0); ---------------------------------------- cut here -------------------- (side note to Larry: if you wanna stick this in 'eg', go ahead...) Probably should have had lunch first, -- / Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 \ | on contract to BiiN (for now :-) Hillsboro, Oregon, USA. | |<@intel-iwarp.arpa:merlyn@intelob.intel.com> ...!uunet!tektronix!biin!merlyn | \ Cute quote: "Welcome to Oregon... home of the California Raisins!" / Path: arizona!rupley From: rupley@arizona.edu (John Rupley) Newsgroups: comp.lang.c Subject: Re: Scrunch blank lines Summary: mice and mountains Message-ID: <9996@megaron.arizona.edu> Date: 30 Mar 89 23:43:56 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <7472@phoenix.Princeton.EDU> Distribution: na Organization: U of Arizona CS Dept, Tucson Lines: 45 In article <7472@phoenix.Princeton.EDU>, bernsten@phoenix.Princeton.EDU (Dan Bernstein) writes: > Dave Brower asks for a filter ``that will take "blank line" style cpp > output on stdin and send to stdout a scrunched version with appropriate ^^^^^^^^^^^ > #line directives.'' If we may combine built-in utilities to handle the > problem, then this 9-line shell script will do it (combine the last > two lines to make it 8): > > #!/bin/sh > ( tr XY '\375\376' | sed 's/^\(.\)\(.*\)/X\1\2Y/ > tend > i\ > X#line > d > :end > =' | uniq | tr '\012X' ' \012'; echo ''; ) > | sed 's/Y.*//' | tr '\375\376' XY | sed -n '1!p' I am not sure this is what the original poster wanted, ie ``appropriate'' may refer to #line directives with line numbers that reference the source file, not the cpp output. Regardless, the above script is truly trivial in Lex: %% \n\n+ printf("\n#line %d \n", yylineno); .|\n ECHO; > Ahem? Are we forgetting sed here? (Then again, I hate awk, love sed, > and prefer C to lex. I'd rather have a sed script twice as slow as an > awk script. But that's just personal bias.) How could one forget sed (:-)? But for matching patterns that cross line boundaries, Lex is a natural, because it sees a file as a stream of characters rather than as a stream of records. Sed and awk are record-based and thus seem forced for multi-line matching. Prefer C to Lex? Hmmm... Lex is just the machinery for a pattern-based switch statement, with the user supplying ``case'' statements written in C. John Rupley rupley!local@megaron.arizona.edu Path: arizona!noao!ncar!ames!lll-winken!uunet!munnari!otc!metro!bunyip!uqcspe!qfagus!anvil!michi From: michi@anvil.oz (Michael Henning) Newsgroups: comp.bugs.sys5 Subject: lex bug ? Keywords: lex, parsing Message-ID: <294@anvil.oz> Date: 31 Mar 89 06:35:08 GMT Organization: Anvil Designs Pty Ltd, Brisbane, Australia Lines: 47 I came across the following today. The problems show up both under AIX and Xenix. Could anyone please enlighten me as to whether these are real bugs or am I just overlooking something ? Problem 1: The following lex input file recognizes comments as a '*' at the beginning of a line followed by any number of characters to the end of the line: %% ^\*.*$ printf("comment: %s\n", yytext); If the file is changed to comment ^\*.*$ %% {comment} printf("comment: %s\n", yytext); then comments are no longer recognised. Problem 2: The following lex input file is supposed to recognize empty lines. {empty_line} ^$ %% {empty_line} printf("empty line\n"); The program compiles, but does not recognize empty lines. If the program is changed to %% ^$ printf("empty line\n"); then lex reports a syntax error on line 2. Any help on these would be greatly appreciated, I have not used lex before, and the documentation is somewhat terse in places... Michi. -- | The opinions expressed are my own, not those of my employer. | | | | Michael (Michi) Henning | | - We have three Michaels here, that's why they call me Michi | Path: arizona!rupley From: rupley@arizona.edu (John Rupley) Newsgroups: comp.lang.c Subject: Re: Scrunch blank lines Summary: Lex solution Message-ID: <10029@megaron.arizona.edu> Date: 31 Mar 89 21:13:38 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> Distribution: na Organization: U of Arizona CS Dept, Tucson Lines: 25 >From rupley!local Fri Mar 31 13:43:14 1989 In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: >So, I offer this week's challenge: Smallest program that will take >"blank line" style cpp output on stdin and send to stdout a scrunched >version with appropriate #line directives. The following Lex source is somewhat shorter than a previous Lex version. Specifications assumed: single blank lines, as well as runs of blank lines +- <#> line directives, are to be replaced by <# lineno "filename">; only truly blank lines (no space or tab) are to be considered blank. ------------------------------------------------------------------------ char f[80]; %S P %% #.+\n {sscanf(yytext,"#%d%s",&yylineno,f);BEGIN P;}

.+\n {printf("# %d %s\n",yylineno-1,f);ECHO;BEGIN 0;} \n BEGIN P; .+\n ECHO; ------------------------------------------------------------------------ John Rupley rupley!local@megaron.arizona.edu Path: arizona!rupley From: rupley@arizona.edu (John Rupley) Newsgroups: comp.lang.c Subject: Re: Scrunch blank lines Summary: yet another Lex version Message-ID: <10031@megaron.arizona.edu> Date: 1 Apr 89 01:57:42 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <10029@megaron.arizona.edu> Distribution: na Organization: U of Arizona CS Dept, Tucson Lines: 20 In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: >So, I offer this week's challenge: Smallest program that will take >"blank line" style cpp output on stdin and send to stdout a scrunched >version with appropriate #line directives. Yet another Lex version: ------------------------------------------------------------------------ char f[80]; int x; %% #.+\n {sscanf(yytext,"#%d%s",&yylineno,f); x++;} .+\n {if(x)printf("# %d %s\n",yylineno-1,f); ECHO; x=0;} \n x++; ------------------------------------------------------------------------ John Rupley rupley!local@megaron.arizona.edu From local Mon Apr 3 20:57 MST 1989 To: arizona!uunet!munnari!otc!metro!bunyip!uqcspe!qfagus!anvil!michi Subject: Re: lex bug ? Cc: local Status: R Nope -- no bug. > I came across the following today. The problems show up both under AIX and > Xenix. Could anyone please enlighten me as to whether these are real bugs > or am I just overlooking something ? > Problem 1: > The following lex input file recognizes comments as a '*' at the beginning > of a line followed by any number of characters to the end of the line: > %% > ^\*.*$ printf("comment: %s\n", yytext); > If the file is changed to > comment ^\*.*$ > %% > {comment} printf("comment: %s\n", yytext); > then comments are no longer recognised. I have hoped to see an answer on the net -- but none appeared. Your questions are good, but you should have answered them youself, by experimenting. Several remarks. First, regarding comments, the algorithm you use is wrong, at least for C. But this is not the point. Your question is, I believe, why cannot ^...$ be in a definition? The answer is, in the Rules section <^> must be the first character on a line and <$> must be the last character before a pattern break (space, |, or whatever). If you use a definition, then it is surrounded by {....}, and clearly you cannot get the expected effect of ^ and $. This is undocumented but reasonable, when you think about it. > Problem 2: > The following lex input file is supposed to recognize empty lines. > {empty_line} ^$ > %% > {empty_line} printf("empty line\n"); > The program compiles, but does not recognize empty lines. If the program See above remarks... > is changed to > %% > ^$ printf("empty line\n"); > then lex reports a syntax error on line 2. Hmm -- wouldn't you feel embarrassed if you expected a pattern between <^> and <$> and found none? Lex is indeed powerful, but it pays to be explicit -- if you mean end-of-line, say so; and if you want the other stuff to be printed, say so (else you may get some odd behavior). Better, try: %% ^\n ; .|\n ECHO; John Rupley uucp: ..{uunet | ucbvax | cmcl2 | hao!ncar!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929