| Messages 11-20 from thread "Want a way to strip comments from a" |
Prev 10 Next 10
Jump to [ Start of thread | End of thread ]
leo@philmds.UUCP (Leo de Wit) writes:
>In article <4896@cbnews.ATT.COM> smk@cbnews.ATT.COM (Stephen M. Kennedy) writes:
>|In article <9900010@bradley> brian@bradley.UUCP writes:
>|> The following works in vi: :%s/\/\*.*\*\///g
>|
>|/* And this */ important_variable = 42 /* doesn't work either! */
>
>And how about:
>
> puts(" A comment /* in here */");
>
>And you can give more examples showing it isn't that trivial; a challenge
>for the sed adept, perhaps ...
Does it *have* to be done in sed/awk/other text processor?
This problem is fairly difficult to solve using regexp/editor
commands, but it's a piece of cake to do in C:
#include <stdio.h>
void eatcomment(void);
main()
{
int ch;
int instring = 0;
ch = getchar();
while (ch != EOF) {
switch (ch) {
case '"' :
instring = !instring;
break;
case '/' :
if (!instring)
if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); }
else putchar('/');
break;
case '\\' : /* in case this is a \" in a string, */
putchar('\\'); /* pass it through now and don't let */
ch = getchar(); /* the switch() eat it */
}
putchar(ch);
ch = getchar();
} exit(0);
}
void eatcomment(void)
{
int ch;
for (;;) {
ch = getchar();
while (ch == '*')
if ((ch = getchar()) == '/') return;
if (ch == EOF) exit(1); /* oops */
}
}
------------
This hasn't been tested thoroughly; it's mostly
from memory.
Joe English
jeenglis@nunki.usc.edu
In article <3114@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
|
|leo@philmds.UUCP (Leo de Wit) writes:
|>
|> puts(" A comment /* in here */");
|>
|>And you can give more examples showing it isn't that trivial; a challenge
|>for the sed adept, perhaps ...
|
|Does it *have* to be done in sed/awk/other text processor?
|This problem is fairly difficult to solve using regexp/editor
|commands, but it's a piece of cake to do in C:
Piece of cake? Your program can't even strip its own comments (try it)!
Reason:
| case '"' :
| instring = !instring;
| break;
This is both a defect in your program, and the cause that subsequent
comments aren't detected when using the source as input. After the
sequence '"' instring is 1. Besides it doesn't handle multiple
character char constants (e.g. '/*', though one could perhaps argue
whether it should).
|This hasn't been tested thoroughly; it's mostly
|from memory.
If your memory was ok, the program wasn't tested thoroughly 8-).
Though the problem isn't difficult, it isn't so trivial as you thought
it was.
Leo.
In article <9797@megaron.arizona.edu> you write:
>It still doesn't work. It won't uncomment itself. Or the following line:
>
> '"' /* hi there */ '"'
>
Thanks -- I had a feeling I was forgetting something.
I wrote an uncomment program a couple years ago
(and I swear, it *did* work and it wasn't too hard
to write :-) and I was trying to recall it from
memory. Characters in single-quotes were the other
case I forgot about -- and if I had tested the program
on it's own source I would have caught that oversight.
(I feel really stupid now... I think I'm going to
stop posting to this newsgroup, as I have failed to
say anything correct or intelligent for about a
month now.)
The Lex solution posted is much more elegant and
simple; but since lex isn't universally available
a C version is also useful... (I'm not going to
try a third time, though.)
--Joe English
jeenglis@nunki.usc.edu
In article <983@philmds.UUCP> leo@philmds.UUCP (Leo de Wit) writes:
>In article <3114@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
>|
>|Does it *have* to be done in sed/awk/other text processor?
>|This problem is fairly difficult to solve using regexp/editor
>|commands, but it's a piece of cake to do in C:
>
>Piece of cake? Your program can't even strip its own comments (try it)!
Here's another example in C. It *is* a piece of cake (15 minutes work).
The problem can be described with a simple automata which is easily coded
in in C (with goto's, >yech<). I've tested it on most of the pathological
examples given in this group and it seems to work.
----------------------------------------------------------------------------
/* cstrip.c
pem@zyx.SE, 1989 */
#include <stdio.h>
main()
{
char c, c1;
goto into_code;
in_code:
putchar(c);
into_code:
switch (c = (char)getchar()) {
case EOF:
exit(0);
case '\'':
goto in_char;
case '"':
goto in_string;
case '/':
c1 = c;
if ((c = (char)getchar()) == '*')
goto in_comment;
putchar(c1);
default:
goto in_code;
}
in_char:
putchar(c);
switch (c = (char)getchar()) {
case EOF:Read the rest of this message... (42 more lines)
You know, this discussion has brought up something that has bothered me
(although not a great deal).
When scanning the result of preprocessing a nontrivial C program with
many include files, one finds dozens (in some cases hundreds) of blank
lines. Obviously, they are the result of eliminating preprocessor
directives and multiline comments. What I have always wondered is why,
given the #line directive which can re-sync the preprocessor and the
compiler, does the preprocessor insist on keeping all those blank lines?
Why not eliminate them and issue a #line instead?
Just curious.
Tim_CDC_Roberts@cup.portal.com | Control Data...
...!sun!portal!cup.portal.com!tim_cdc_roberts | ...or it will control you.
In article <852@lynx.zyx.SE>, pem@spunk.zyx.SE (Per-Erik Martin) writes:
> Here's another example in C. It *is* a piece of cake (15 minutes work).
> The problem can be described with a simple automata which is easily coded
> in in C (with goto's, >yech<). I've tested it on most of the pathological
> examples given in this group and it seems to work.
This one fails, too. Try:
/***/ hi there /**/
Goes to show, for a quick and clean coding of a pattern-matching
automaton, think Lex. The Lex source that was posted is so simple it
would be hard to get the logic wrong. Two out of two C postings suggest
that it may be easier to err in coding the same automaton in C.
Not to imply that C has no advantages -- following comparison is for
size of source and for time of uncommenting main.c of an emacs distribution:
timex/real wc -l
13.95 10 eatLex.l Lex
2.53 37 eatC.c C code that works
1:27.13 78 eat.sed Maarten L's recently posted sed script
(more lines than the C code :-) :-)
John Rupley
rupley!local@megaron.arizona.edu
In article <852@lynx.zyx.SE> pem@spunk.zyx.SE (Per-Erik Martin) writes:
|Here's another example in C. It *is* a piece of cake (15 minutes work).
|The problem can be described with a simple automata which is easily coded
|in in C (with goto's, >yech<). I've tested it on most of the pathological
|examples given in this group and it seems to work.
[]
Appearances are deceptive, it won't handle trigraphs. For instance, try:
??' (trigraph for ^) and your code thinks it is in_char.
What's worse, on systems where char isn't signed and EOF == -1, it will
fail to see EOF (suggestion: don't use a char to compare against EOF).
Another cake that is hard to digest (let alone the goto's, it was baked
in only 15 minutes) 8-).
Leo.
P.S. What's the benefit of having a separate program strip off comments anyway?
In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes:
>When scanning the result of preprocessing a nontrivial C program with
>many include files, one finds dozens (in some cases hundreds) of blank
>lines. ... Why not eliminate them and issue a #line instead?
Why bother? Typically there are at most a few tens in a row. It is
probably faster to count 20 blank lines than to process one
`#line 1234' directive.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
In <9900010@bradley>, brian@bradley.UUCP writes:
>> /* Written 9:58 am Mar 9, 1989 by jrv@siemens.UUCP */
>> Does anyone have a sed or awk script which we
>> can use to preprocess the C source and get rid of all the comments before
>> sending it to the compiler?
>
> The following works in vi: :%s/\/\*.*\*\///g
>
> I don't know if it will work in sed, but it should...
Lest anyone actually be tempted to use such a naive method, you should be
aware that it DOESN'T WORK, except for the simplest case of one comment per
line and no multi-line comments. A correct sed command, which I may have
posted before (forgive me) is shown below. To use it on SystemV-derived
seds, you have to first delete all the comments from the sed script
itself (ironically, enough!).
To see all of the reasons why the simple method doesn't work, try this:
Take the test C file appended after the sed script below and run it through
the sed script into a file. Now run diff on the original C file and the one
with comments removed. What you are looking at is all of the various ways
that comments and things looking almost like comments can be intertwined in C
source files.
Michael Condict {att|allegra}!m10ux!mnc
AT&T Bell Labs (201)582-5911 MH 3B-416
Murray Hill, NJ
-------------------- Sed script to delete C comments -------------------------
# Delete comments from C source files:
: delcom
/\/\*/{
# Change first comment delim to @ (after eliminating existing @'s):
s/@/<Used#to%be+an-At>/g
s:/\*:@:
# Read until we have the end comment:
: morecm
/\*\//!{
# Just to cut down on max buffer length:
s/@.*/@/
N
b morecm
}
# Get rid of any $'s:
s/\$/<Used#to%be+a-Dollar>/gRead the rest of this message... (56 more lines)
In article <16492@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <16078@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes:
>>When scanning the result of preprocessing a nontrivial C program with
>>many include files, one finds dozens (in some cases hundreds) of blank
>>lines. ... Why not eliminate them and issue a #line instead?
>
>Why bother? Typically there are at most a few tens in a row. It is
>probably faster to count 20 blank lines than to process one
>`#line 1234' directive.
Howsabout 'cat -s file.c | whatever' or just 'more -s file.c' ?
--Blair
"What is the sound of one
Usener posting...many times?"
Prev 10 Next 10
Jump to [ Start of thread | End of thread ]
©2004 Google