Path: arizona!noao!ncar!unmvax!tut.cis.ohio-state.edu!rutgers!apple!oliveb!amdahl!rtech!gonzo!daveb
From: daveb@gonzo.UUCP (Dave Brower)
Newsgroups: comp.lang.c
Subject: Re: Want a way to strip comments from a
Summary: I feel a contest coming on
Message-ID: <620@gonzo.UUCP>
Date: 24 Mar 89 03:11:44 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <16078@cup.portal.com> <16492@mimsy.UUCP>
Reply-To: daveb@gonzo.UUCP (Dave Brower)
Distribution: na
Organization: Gonzo Media Group
Lines: 25

In article <16492@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes:
>>When scanning the result of preprocessing a nontrivial C program with 
>>many include files, one finds dozens (in some cases hundreds) of blank
>>lines. ... Why not eliminate them and issue a #line instead?
>
>Why bother?  Typically there are at most a few tens in a row.  It is
>probably faster to count 20 blank lines than to process one
>`#line 1234' directive.

Yup, true enough for compilation.  It is sort of annoying tough when you
need to look at the intermediate file to figure something out.

So, I offer this week's challenge:  Smallest program that will take
"blank line" style cpp output on stdin and send to stdout a scrunched
version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
perl, c, c++ are all acceptable.  This will be an amusing excercise in
typical text massaging that can be enlightening for many people.

Is this branching out of comp.lang.c?  Where should it go?

-dB
-- 
"I came here for an argument." "Oh.  This is getting hit on the head"
{sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb	daveb@gonzo.uucp


>From peckham@svax.cs.cornell.edu Fri Mar 24 10:03:21 1989
Path: arizona!noao!ncar!mailrus!cornell!peckham
From: peckham@svax.cs.cornell.edu (Stephen Peckham)
Newsgroups: comp.lang.c
Subject: Scrunch blank lines
Message-ID: <26389@cornell.UUCP>
Date: 24 Mar 89 17:03:21 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <16078@cup.portal.com> <16492@mimsy.UUCP> <620@gonzo.UUCP>
Sender: nobody@cornell.UUCP
Reply-To: peckham@svax.cs.cornell.edu (Stephen Peckham)
Distribution: na
Organization: Cornell Univ. CS Dept, Ithaca NY
Lines: 22

In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
>perl, c, c++ are all acceptable.  This will be an amusing excercise in
>typical text massaging that can be enlightening for many people.
>
Here's an awk program that will do the trick.  Single blank lines are left as
is.  Multiple blank lines are removed, and a new line directive is added.

{if (NF == 0) blanks++
 else if ($1=="#") {l_no = $2-1; f = $3; blanks = 2;}
 else {
	if (blanks > 1) print "#", l_no, f;
	else if (blanks == 1) print "";
	blanks = 0;
	print $0;
      }
 l_no++;
}	

Steve Peckham


Path: arizona!rupley
From: rupley@arizona.edu (John Rupley)
Newsgroups: comp.lang.c
Subject: Re: Want a way to strip comments from a
Summary: is this what may be wanted?
Message-ID: <9887@megaron.arizona.edu>
Date: 26 Mar 89 02:36:13 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <620@gonzo.UUCP>
Distribution: na
Organization: U of Arizona CS Dept, Tucson
Lines: 76


> In article <620@gonzo.UUCP>, daveb@gonzo.UUCP (Dave Brower) writes:
> So, I offer this week's challenge:  Smallest program that will take
> "blank line" style cpp output on stdin and send to stdout a scrunched
> version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
> perl, c, c++ are all acceptable.  This will be an amusing excercise in
> typical text massaging that can be enlightening for many people.

"Scrunching" is probably a matter of taste, with regard to the format
of the ouput.  So I am not sure what you, yourself, want.  But below
is a guess.  Lex, of course.  May not be portable, but it should work
with minor mods on other Unices.  Should be easy to modify for different
output format.

John Rupley
rupley!local@megaron.arizona.edu


%{ /*---------------------------start of text---------------------------*/
/*-
 * SCRUNCH.l
 *
 * Scrunch cpp output.
 * 	In-Reply-To: daveb@gonzo.UUCP (Dave Brower)
 * 	Message-ID: <620@gonzo.UUCP>			#comp.lang.c
 * 
 * Compress runs of "#" lines and blank lines, or runs of two or more
 * blank lines:
 * 	(\n*# lineno "file"\n+)*  or  \n\n\n+
 * into a single line:
 *	#line lineno "file"\n
 * which is output before the next line of program text 
 * (corresponding to line "lineno" of the source "file").
 * The values of "lineno" and "file" are adjusted for changes in
 * source resulting from #include statements.
 * Lines with whitespace are not considered blank and are passed.
 *
 * Compilation:
 *	lex scrunch.l
 *	cc -O lex.yy.c -ll -o scrunch
 *
 * Minimally tested with UNIX sys5r2 cpp only, as follows:
 * (a)	/lib/cpp -Dprocessor=1 lex.yy.c >scruch.cpp	#specify your processor
 *	scrunch <scrunch.cpp >scrunch.cpp.c
 *	cc -O scrunch.cpp.c -ll
 *	cmp -l a.out scrunch		#should give date/name diffs only
 * (b)	compare line numbers in scrunch.cpp.c with lex.yy.c and scrunch.cpp
 *		(no differences stood out)
 *
 * Possible bugs:
 *	escaped newlines in macros.
 *	????
 *
 * John Rupley
 * rupley!local@megaron.arizona.edu
 */
%}
	char		file[BUFSIZ];

POUND	#[ ]+[0-9]+[ ]+\".*$
TEXT	[^#\n].*$
%START	POUND TEXT
%%
<INITIAL>.	{unput(yytext[0]); BEGIN TEXT;}
<POUND>{POUND}	sscanf(yytext, "# %d %s", &yylineno, &file[0]);
<POUND>{TEXT}	{printf("#line %d %s\n", yylineno-1, file); ECHO; BEGIN TEXT;}
<POUND>\n	;
<TEXT>{POUND}	{sscanf(yytext, "# %d %s", &yylineno, &file[0]); BEGIN POUND;}
<TEXT>\n{3,}	{printf("\n"); BEGIN POUND;}
<TEXT>{TEXT}|\n	ECHO;
.		printf("\nERROR: file %s, line %d, char 0x%x=%c\n",
			file, yylineno, (unsigned int) yytext[0], yytext[0]);
%%
/*----------------------------end of text-------------------------------*/


From daveb@gonzo.UUCP Tue Mar 28 10:14:59 1989
Path: arizona!noao!ncar!ames!pacbell!rtech!gonzo!daveb
From: daveb@gonzo.UUCP (Dave Brower)
Newsgroups: comp.lang.c
Subject: Re: Scrunch blank lines
Summary: BZZT. Wrong answer
Message-ID: <623@gonzo.UUCP>
Date: 28 Mar 89 17:14:59 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> <6839@cg-atla.UUCP>
Reply-To: daveb@gonzo.UUCP (Dave Brower)
Distribution: na
Organization: Gonzo Media Group
Lines: 27

In article <6839@cg-atla.UUCP> duane@cg-atla.UUCP (Andrew Duane) writes:
>> In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>>So, I offer this week's challenge:  Smallest program that will take
>>"blank line" style cpp output on stdin and send to stdout a scrunched
>>version with appropriate #line directives.  [f]lex, Yacc, [na]awk, sed,
>>perl, c, c++ are all acceptable.
>
>If shell scripts are acceptable, how about:
>
>	#!/bin/sh
>	cat -s
>
>You may have to use "more" rather than cat. The moral: please
>don't reinvent the wheel [1/2 ;-)]

Sorry, you lept at the naive and incorrect solution.   Please  say "with
appropriate #line directives."  Cat -s obfuscates matching the output
lines with the input lines.  That is the point of the challenge.

I have two entries so far, one in "lex" and another in "awk".  Both are
less than 20 lines.  It will be interesting to compare timings between
awk, gawk, nawk, lex and flex.

-dB
-- 
"I came here for an argument." "Oh.  This is getting hit on the head"
{sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb	daveb@gonzo.uucp


Path: arizona!noao!ncar!boulder!sunybcs!rutgers!njin!princeton!phoenix!bernsten
From: bernsten@phoenix.Princeton.EDU (Dan Bernstein)
Newsgroups: comp.lang.c
Subject: Re: Scrunch blank lines
Summary: sed
Message-ID: <7472@phoenix.Princeton.EDU>
Date: 29 Mar 89 21:56:36 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> <6839@cg-atla.UUCP> <623@gonzo.UUCP>
Reply-To: bernsten@phoenix.Princeton.EDU (Dan Bernstein)
Distribution: na
Organization: Princeton U. Undergrad Math Majors, last time I checked
Lines: 35

Dave Brower asks for a filter ``that will take "blank line" style cpp
output on stdin and send to stdout a scrunched version with appropriate
#line directives.'' If we may combine built-in utilities to handle the
problem, then this 9-line shell script will do it (combine the last
two lines to make it 8):

  #!/bin/sh
  ( tr XY '\375\376' | sed 's/^\(.\)\(.*\)/X\1\2Y/
  tend
  i\
  X#line
  d
  :end
  =' | uniq | tr '\012X' ' \012'; echo ''; )
  | sed 's/Y.*//' | tr '\375\376' XY | sed -n '1!p'

The idea is reasonably simple; one could use, e.g., grep -n '.' to
obtain a similar solution. This particular version destroys any \375 and
\376 you may have in your source, and because it's based on sed, it omits
the final line if it has no newline. It has been tested successfully on
a wide variety of sources, and I must say the next time I feel compelled
to look at cpp output, I'll definitely use it.

> I have two entries so far, one in "lex" and another in "awk".  Both are
> less than 20 lines.  It will be interesting to compare timings between
> awk, gawk, nawk, lex and flex.

Ahem? Are we forgetting sed here? (Then again, I hate awk, love sed,
and prefer C to lex. I'd rather have a sed script twice as slow as an
awk script. But that's just personal bias.)

If you time, make sure to test out on really long sources too. I'd hate
to see my script penalized just because it totals eight+sh execs :-).

---Dan Bernstein, bernsten@phoenix.princeton.edu


Path: arizona!noao!ncar!unmvax!tut.cis.ohio-state.edu!ukma!husc6!ogccse!littlei!omepd!merlyn
From: merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge)
Newsgroups: comp.lang.c
Subject: FIXED Perl line-scruncher solution (was Re: Scrunch blank lines)
Summary: arrrgh
Keywords: messedup
Message-ID: <4260@omepd.UUCP>
Date: 30 Mar 89 21:59:43 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP> <6839@cg-atla.UUCP> <623@gonzo.UUCP> <4257@omepd.UUCP>
Sender: news@omepd.UUCP
Reply-To: merlyn@intelob.intel.com (Randal L. Schwartz @ Stonehenge)
Distribution: na
Organization: Stonehenge; netaccess via BiiN, Hillsboro, Oregon, USA
Lines: 39
Posted: Thu Mar 30 13:59:43 1989
In-reply-to: myself :-)

In article <4257@omepd.UUCP>, a rather dingy merlyn@intelob writes:
| Here's my solution, just 5 lines of (mostly readable :-) Perl...
| 
| #!/usr/bin/perl
| for ($file = "stdin", $line = 1; $_ = <stdin>;) {
| 	($line = $1, $file = $2, $sync = 0, next) if /^#\W*(\d+)\W+"(.*)"\W*$/;
| 	($sync = 0, $line++, next) if /^\W*$/;
| 	printf "#line %d \"%s\"\n", $line, $file unless $sync++;
| 	print; $line++;
| }

Arrrgh.  Someone should beat into my head that non-word != whitespace.
All the '\W' up there should be '\s'. Anyway, for those of you that
want the "new and improved" version, try this (I call it "seepp"):

---------------------------------------- cut here --------------------
#!/usr/bin/perl
open(CPP,$a = "/lib/cpp " . join(" ",@ARGV) . "|") ||
	die "Cannot exec '$a' ($!)\n";
for ($file = "/dev/null", $line = 1; $_ = <CPP>;) {
	($file = $1, $line = $2, $sync = 0, next) if /^#\s*"(.*)"\s+(\d+)\s*$/;
	($file = $1, $line++, $sync = 0, next) if /^#\s*"(.*)"\s*$/;
	($line = $1, $file = $2, $sync = 0, next) if /^#\s*(\d+)\s+"(.*)"\s*$/;
	($line = $1, $sync = 0, next) if /^#\s*(\d+)\s*$/;
	($sync = 0, $line++, next) if /^\s*$/;
	printf "#line %d \"%s\"\n", $line, $file unless $sync++;
	print; $line++;
}
close(CPP) || die "Cannot close '$a' ??? ($!)\n";
exit(0);
---------------------------------------- cut here --------------------
(side note to Larry: if you wanna stick this in 'eg', go ahead...)

Probably should have had lunch first,
-- 
/     Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095        \
|        on contract to BiiN (for now :-) Hillsboro, Oregon, USA.             |
|<@intel-iwarp.arpa:merlyn@intelob.intel.com> ...!uunet!tektronix!biin!merlyn |
\     Cute quote: "Welcome to Oregon... home of the California Raisins!"      /


Path: arizona!rupley
From: rupley@arizona.edu (John Rupley)
Newsgroups: comp.lang.c
Subject: Re: Scrunch blank lines
Summary: mice and mountains
Message-ID: <9996@megaron.arizona.edu>
Date: 30 Mar 89 23:43:56 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <7472@phoenix.Princeton.EDU>
Distribution: na
Organization: U of Arizona CS Dept, Tucson
Lines: 45


In article <7472@phoenix.Princeton.EDU>, bernsten@phoenix.Princeton.EDU (Dan
Bernstein) writes:
> Dave Brower asks for a filter ``that will take "blank line" style cpp
> output on stdin and send to stdout a scrunched version with appropriate
                                                              ^^^^^^^^^^^
> #line directives.'' If we may combine built-in utilities to handle the
> problem, then this 9-line shell script will do it (combine the last
> two lines to make it 8):
> 
>   #!/bin/sh
>   ( tr XY '\375\376' | sed 's/^\(.\)\(.*\)/X\1\2Y/
>   tend
>   i\
>   X#line
>   d
>   :end
>   =' | uniq | tr '\012X' ' \012'; echo ''; )
>   | sed 's/Y.*//' | tr '\375\376' XY | sed -n '1!p'

I am not sure this is what the original poster wanted, ie ``appropriate''
may refer to #line directives with line numbers that reference the 
source file, not the cpp output.  Regardless, the above script is 
truly trivial in Lex:

%%
\n\n+	printf("\n#line %d \n", yylineno);
.|\n	ECHO;

> Ahem? Are we forgetting sed here? (Then again, I hate awk, love sed,
> and prefer C to lex. I'd rather have a sed script twice as slow as an
> awk script. But that's just personal bias.)

How could one forget sed (:-)?  But for matching patterns that cross
line boundaries, Lex is a natural, because it sees a file as a stream of
characters rather than as a stream of records. Sed and awk are record-based
and thus seem forced for multi-line matching.  Prefer C to Lex? Hmmm... Lex
is just the machinery for a pattern-based switch statement, with the user
supplying ``case'' statements written in C.

John Rupley
rupley!local@megaron.arizona.edu


Path: arizona!noao!ncar!ames!lll-winken!uunet!munnari!otc!metro!bunyip!uqcspe!qfagus!anvil!michi
From: michi@anvil.oz (Michael Henning)
Newsgroups: comp.bugs.sys5
Subject: lex bug ?
Keywords: lex, parsing
Message-ID: <294@anvil.oz>
Date: 31 Mar 89 06:35:08 GMT
Organization: Anvil Designs Pty Ltd, Brisbane, Australia
Lines: 47

I came across the following today. The problems show up both under AIX and
Xenix. Could anyone please enlighten me as to whether these are real bugs
or am I just overlooking something ?

Problem 1:

The following lex input file recognizes comments as a '*' at the beginning
of a line followed by any number of characters to the end of the line:

%%
^\*.*$	printf("comment: %s\n", yytext);

If the file is changed to

comment	^\*.*$
%%
{comment}	printf("comment: %s\n", yytext);

then comments are no longer recognised.


Problem 2:

The following lex input file is supposed to recognize empty lines.

{empty_line}	^$
%%
{empty_line}	printf("empty line\n");

The program compiles, but does not recognize empty lines. If the program
is changed to

%%
^$	printf("empty line\n");

then lex reports a syntax error on line 2.


Any help on these would be greatly appreciated, I have not used lex before,
and the documentation is somewhat terse in places...

					Michi.
-- 
               | The opinions expressed are my own, not those of my employer. |
               |                                                              |
               | Michael (Michi) Henning                                      |
               | - We have three Michaels here, that's why they call me Michi |


Path: arizona!rupley
From: rupley@arizona.edu (John Rupley)
Newsgroups: comp.lang.c
Subject: Re: Scrunch blank lines
Summary: Lex solution
Message-ID: <10029@megaron.arizona.edu>
Date: 31 Mar 89 21:13:38 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <26389@cornell.UUCP>
Distribution: na
Organization: U of Arizona CS Dept, Tucson
Lines: 25


>From rupley!local Fri Mar 31 13:43:14 1989
In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.

The following Lex source is somewhat shorter than a previous Lex version.
Specifications assumed:  single blank lines, as well as runs of blank
lines +- <#> line directives, are to be replaced by <# lineno
"filename">; only truly blank lines (no space or tab) are to be
considered blank.  
------------------------------------------------------------------------
	char f[80];
%S P
%%
#.+\n	{sscanf(yytext,"#%d%s",&yylineno,f);BEGIN P;}
<P>.+\n	{printf("# %d %s\n",yylineno-1,f);ECHO;BEGIN 0;}
\n	BEGIN P;
.+\n	ECHO;
------------------------------------------------------------------------
John Rupley
rupley!local@megaron.arizona.edu


Path: arizona!rupley
From: rupley@arizona.edu (John Rupley)
Newsgroups: comp.lang.c
Subject: Re: Scrunch blank lines
Summary: yet another Lex version
Message-ID: <10031@megaron.arizona.edu>
Date: 1 Apr 89 01:57:42 GMT
References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <10029@megaron.arizona.edu>
Distribution: na
Organization: U of Arizona CS Dept, Tucson
Lines: 20


In article <620@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>So, I offer this week's challenge:  Smallest program that will take
>"blank line" style cpp output on stdin and send to stdout a scrunched
>version with appropriate #line directives.

Yet another Lex version:
------------------------------------------------------------------------
	char f[80]; int x;
%%
#.+\n	{sscanf(yytext,"#%d%s",&yylineno,f); x++;}
.+\n	{if(x)printf("# %d %s\n",yylineno-1,f); ECHO; x=0;}
\n	x++;
------------------------------------------------------------------------
John Rupley
rupley!local@megaron.arizona.edu


From local Mon Apr  3 20:57 MST 1989
To: arizona!uunet!munnari!otc!metro!bunyip!uqcspe!qfagus!anvil!michi
Subject: Re: lex bug ?
Cc: local
Status: R


Nope -- no bug.

> I came across the following today. The problems show up both under AIX and
> Xenix. Could anyone please enlighten me as to whether these are real bugs
> or am I just overlooking something ?
> Problem 1:
> The following lex input file recognizes comments as a '*' at the beginning
> of a line followed by any number of characters to the end of the line:
> %%
> ^\*.*$	printf("comment: %s\n", yytext);
> If the file is changed to
> comment	^\*.*$
> %%
> {comment}	printf("comment: %s\n", yytext);
> then comments are no longer recognised.

I have hoped to see an answer on the net --  but none appeared.  Your 
questions are good, but you should have answered them youself, by
experimenting.

Several remarks.  First, regarding comments, the algorithm you use is 
wrong, at least for C.  

But this is not the point.  Your question is, I believe, why cannot
^...$ be in a definition?  The answer is, in the Rules section <^> must
be the first character on a line and <$> must be the last character
before a pattern break (space, |, or whatever).  If you use a
definition, then it is surrounded by {....}, and clearly you cannot get
the expected effect of ^ and $. This is undocumented but reasonable,
when you think about it.

> Problem 2:
> The following lex input file is supposed to recognize empty lines.
> {empty_line}	^$
> %%
> {empty_line}	printf("empty line\n");
> The program compiles, but does not recognize empty lines. If the program

See above remarks...

> is changed to
> %%
> ^$	printf("empty line\n");
> then lex reports a syntax error on line 2.

Hmm -- wouldn't you feel embarrassed if you expected a pattern between
<^> and <$> and found none?

Lex is indeed powerful, but it pays to be explicit -- if you mean 
end-of-line, say so; and if you want the other stuff to be printed,
say so (else you may get some odd behavior).  Better, try:

%%
^\n	;
.|\n	ECHO;

John Rupley
 uucp: ..{uunet | ucbvax | cmcl2 | hao!ncar!noao}!arizona!rupley!local
 internet: rupley!local@megaron.arizona.edu
 (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929