rupley.com -- Coding Contests

Contests

A browser can access this directory, Contests, for navigation and listing (click on the link in this sentence or the Section header). Directory browsing is the only way to view the files in this directory or its subdirectories.

Some years ago, when a question on code was posted on comp.lang.c or certain other netgroups, a thread might have started, presenting different solutions to the problem, in effect a contest. This directory has three threads of this kind from the late 1980's (click on a link below to access a subdirectory for navigation and listing):

Contests/uncomment -- uncomment C code: solutions in the C, lex, and sed languages.

Contests/SED -- filter out repetitions of a specific triple of lines: solutions in the sed and lex languages;

Contests/scrunch - compress (scrunch) C preprocessor (cpp) output to make it more readable: solutions in the awk and lex languages;

The value in exercises of this kind, beyond one's amusement, is in demonstrating or learning the possibilities in a language or approach. The sed commands for uncomment are a tour de force. The lex solution for the SED problem is a startlingly simple one liner.

Following Sections describe the threads in detail. Again, to view code or other files, you must browse a directory, by clicking on a Section-heading link.

Contests/uncomment

Directory listing

./uncomment
|-- 0Postings		directory - statement of problem; responses ....
|   |-- postings.txt			text file summary of postings
|   `-- uncomment-[0a]-[1-4].html	5 html files of postings
|-- 0README		this file
|-- Makefile		to test solutions, run: make test
|-- Solutions		directory - ca. 10 solutions that were posted
|-- Tests		directory - results of tests (see Makefile)
|-- Trials-lex		directory - different lex solutions
|-- eatC.c		chris torek's C solution
|-- eatLex.l		john rupley's lex solution
|-- eatSed		maarten litmaath's sed solution
`-- uncomment.tst2	test file with pathological C comments

PROBLEM: remove comments from C code.

SOLUTIONS: three selected; one written in C (by Chris Torek), another a sed script (by Maarten Litmaath), the third a lex source (by myself).

It is instructive to compare the examples.

The C code is fast and straightforward but difficult to get right (Chris Torek wrote it in 10 minutes, correctly, but several other posters did not do so well). Length: 58 lines.

Maarten Litmaath's sed script is a tour de force, and worth study for its techniques. The posting (in file: 0postings/postings.txt) comments on the method. But it is not the simplest or the fastest way to uncomment. Length: 78 lines.

Lex is preferred, IMHO - simple, easy to write and get right, and in time of execution close to the C-code solution. And it's the shortest by a factor of 3 or more. Length: 15 lines.

Lex is character-stream oriented, as distinguished from the line orientation of sed and awk, so it serves best for matching patterns that cross line boundaries. This is perhaps the take-home message of the exercise.

C-code solution (by Chris Torek):

/*
In article <16539@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
In article <9864@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes:
>Score, anyone? (recent postings tested on K&R-I-syntax code)
>
>	sed        1/1 correct
>	Lex        2/2 correct
>	C          2/2 wrong

This sounds like a CHALLENGE!  :-)

I wrote the following working against the ten-minute spaghetti clock.
It is slightly tested, and probably works, with the exception of

	#include <some/*weird:file[syntax]>

(and unclosed comments, etc., in included files).  It is more
permissive than real C (allowing newlines in string and character
constants, and allowing infintely long character constants) but should
not get anything wrong that cpp gets right.

Of course, there are no comments in it. :-)
*/

#include <stdio.h>

enum states { none, slash, quote, qquote, comment, cstar };

main()
{
	register int c, q = 0;
	register enum states state = none;

	while ((c = getchar()) != EOF) {
		switch (state) {
		case none:
			if (c == '"' || c == '\'') {
				state = quote;
				q = c;
			} else if (c == '/') {
				state = slash;
				continue;
			}
			break;
		case slash:
			if (c == '*') {
				state = comment;
				continue;
			}
			state = none;
			(void) putchar('/');
			break;
		case quote:
			if (c == '\\')
				state = qquote;
			else if (c == q)
				state = none;
			break;
		case qquote:
			state = quote;
			break;
		case comment:
			if (c == '*')
				state = cstar;
			continue;
		case cstar:
			if (c != '*')
				state = c == '/' ? none : comment;
			continue;
		default:
			fprintf(stderr, "impossible state %d\n", state);
			exit(1);
		}
		(void) putchar(c);
	}
	if (state != none)
		fprintf(stderr, "warning: file ended with unterminated %s\n",
			state == quote || state == qquote ?
				(q=='"' ? "string" : "character constant") :
			"comment");
	exit(0);
}
/*
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris
*/

Sed script solution (by Maarten Litmaath):

: loop
/^$/{
	x
	p
	n
	b loop
}
/^"/{
	: double
	/^$/{
		x
		p
		n
		b double
	}
	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//
	/^"/b break
	/^\\/{
		H
		x
		s/\n\(.\).*/\1/
		x
		s/.//
	}
	b double
}
/^'/{
	: single
	/^$/{
		x
		p
		n
		b single
	}
	H
	x
	s/\n\(.\).*/\1/
	x
	s/.//
	/^'/b break
	/^\\/{
		H
		x
		s/\n\(.\).*/\1/
		x
		s/.//
	}
	b single
}
/^\\/{
	H
	x
	s/\n\(.\).*/\1/
	x
	b break
}
/^\/\*/{
	s/.//
	: comment
	s/.//
	/^$/n
	/^*\//{
		s/..//
		b loop
	}
	b comment
}
: break
H
x
s/\n\(.\).*/\1/
x
s/.//
b loop

Lex solution (by John Rupley):

%{
 /*
 * long strings (with escaped newlines) blow yytext[YYLMAX]; 
 * 	this is a feature, not a bug.
 */
%}
STRING          \"(\\\n|\\\"|[^"\n])*\"
COMMENTBODY     ([^*\n]|"*"+[^*/\n])*
COMMENTEND      ([^*\n]|"*"+[^*/\n])*"*"*"*/"
QUOTECHAR       \'[^\\]\'|\'\\.\'|\'\\[x0-9][0-9]*\'
ESCAPEDCHAR     \\.
%START  COMMENT
%%
<COMMENT>{COMMENTBODY}          ;
<COMMENT>{COMMENTEND}           BEGIN 0;
<COMMENT>.|\n                   ;
"/*"                            BEGIN COMMENT;
{STRING}                        ECHO;
{QUOTECHAR}                     ECHO;
{ESCAPEDCHAR}                   ECHO;
.|\n                            ECHO;

For a test of all three solutions:

	make spotless test

For other solutions offered, see the directory Solutions.

For discussion and yet more solutions, see the comp.lang.c postings:

0Postings/postings.txt

or, the more readable

0Postings/uncomment-[0a]-[1-4].html

Contests/SED

Directory listing

./SED
|-- 0Postings		directory - statement of problem; responses ....
|   |-- postings.txt		text file summary of postings
|   `-- sed-[ab]-[12].html	3 html files of postings
|-- 0README			this file
|-- dewit.sed		sed solution - Leo de Wit
|-- jar.sedA		sed solution - John Rupley
|-- litmaath.sed	sed solution - Maarten Litmaath
|-- jar.l			lex solution - John Rupley
|-- a.out			executable for lex solution
`-- testfile		test file, filtered by sed or lex solutions

PROBLEM:

Got a sed question for you: I'm creating a sed filter to mask out and delete unwanted characters in a trace file. I'm having trouble with this configuration:

LINE I:    PROMPT>
LINE I+1:  >	
LINE I+2:  <

Whenever I see this series, I want to delete ALL THREE lines. I can't create a pattern:

/PROMPT\>\\n\>\\n\</d

as far as I can tell, because sed won't allow patterns to span lines. Is there a way to do this, without resorting to awk or ed?

Susan Maxwell

SOLUTIONS: four selected; NOTE that the lex solution is simplest - lex fits best a problem with matching of patterns that cross line ends.

Sed solutions (only shortest listed here):

dewit.sed (10 lines)

	: start
	$p
	$q
	N
	/^PROMPT>\n>n<$/d
	/\n.*\n/{
		P
		D
	}
	b start

litmaath.sed (15 lines)

jar.sed (17 lines)

Lex solution:

jar.l (1 line + 1 separator)

	%%
	^PROMPT>\n>\n<\n                ;

The following commands test the solutions:

makelex <jar.l ; a.out <testfile >scr
dewit.sed <testfile | diff - scr
jar.sed <testfile | diff - scr
litmaath.sed <testfile | diff - scr
	[litmaath.sed misses a sequence]

For discussion, see the comp.lang.c postings:

0Postings/postings.txt

or, the more readable

0Postings/sed-[ab]-[12].html

Contests/scrunch

Directory listing

./scrunch
|-- 0NOTE		port to mac os x from irix: lex differences
|-- 0Postings		directory - statement of problem; responses ....
|   |-- postings.txt		text file summary of postings
|   `-- scrunch-[12]-[12].html	3 html files of postings
|-- 0README		this file
|-- Lex-trials		direcetory - different lex solutions to problem
|-- Makefile		to test awk and lex solutions, run: make test
|-- scrunchAwk		awk solution of problem - (stephen peckham)
|-- scrunchLex.l	lex solution - (john rupley)
`-- scrunchLex.l.irix	irix version, ported to mac os x as above file

PROBLEM: scrunch C preprocessor output to make it more readable.

Compress runs of "#" lines and blank lines, or runs of two or more blank lines:

	(\n*# lineno "file"\n+)+    or    \n\n+

into a single line:

	# lineno "file"\n

which is output before the next line of program text (corresponding to line "lineno" of the source "file"). The values of "lineno" and "file" are adjusted for changes in source resulting from #include statements.

SOLUTIONS: an awk version (by Stephen Peckham) and a lex version (by myself). Both solutions are comparably simple. Here the character-stream processing of lex has no advantage over the line-oriented processing of awk. Thus, IMHO, awk (or perl) is preferable. C-code or sed-script solutions would be more complicated.

Awk solution (From: peckham@svax.cs.cornell.edu (Stephen Peckham):

awk '
{if (NF == 0) blanks++
	else if ($1=="#") {l_no = $2-1; f = $3; blanks = 2;}
	else {
		if (blanks > 1) print "#", l_no, f;
		else if (blanks == 1) print "";
		blanks = 0;
		print $0;
	}
	l_no++;
}'

Lex solution (John Rupley rupley!local@cs.arizona.edu):

	char f[2048]; int x; int xxlineno;
W	[ \t]*\n
%%
#.+\n		{sscanf(yytext,"#%d%s",&xxlineno,f); x++;}
{W}/{W}|#	x++; xxlineno++;
{W}		(!x) ECHO; xxlineno++;
.+\n		{if(x) printf("# %d %s\n",xxlineno,f); 
			ECHO; xxlineno++; x=0;}

To test the solutions, run:

make clean test

The test consists of (see Makefile):

1. creating a test cpp file from the lex program output, lex.yy.c (lex.yy.c is compiled with gcc to give the lex executable);

2. then "scrunching" the cpp file with the awk or the lex executable, and comparing the scrunched outputs (no significant differences);

3. finally, showing that the scrunched output is "good", i.e., it will compile to give a working executable identical in operation to the lex or awk executables.

For discussion, see the comp.lang.c postings:

0Postings/postings.txt

or, the more readable

scrunch-[12]-[12].html