View Full Version : strllen() under Linux & Heisenbug !?
felix
04-07-2004, 03:00 PM
Hi,
1 - Someone know where i can found a implementation of strllen() to use under linux ?? I can't found.. :(
2 - I had read about heisen bugs (http://info.astrian.net/jargon/terms/h/heisenbug.html), based on it i tryed some tests with var not initialized, overflows (but is this case the porgram doesn't work), but it can be debugged normally under gdb and libtrace. Someone know a example of a heisen bug to test agains a gdb and libtrace ?
ps.: I'm very curiosilly... 8O
Regards
RobSeace
04-07-2004, 10:40 PM
strllen()? With 2 "l"s?? Never heard of it... Neither has Google,
apparently... ;-) Describe what it does...
Well, the main thing about Heisenbugs is that they are, by nature,
unpredictable and depend on random circumstances... They
aren't really something you can easily replicate on command...
Typically, they rely on very specific stack contents, which just
happen to cause some really strange and completely unexpected
behavior, usually due to a buffer overflow at some point, which
trashed memory in a very subtle way, or something similar... I
mean, you can code up a completely artificial demonstration of
one, I suppose, but it wouldn't truly give you insight into the real
beast in all its horrific glory; you can only gain that insight by
being forced to deal with one in real-world code... ;-) But, even a
trivial demo would have to be written tied specifically to a certain
machine and memory layout to really get it to behave as you
want... But, the basic idea is that there's some bug in the program
that depends somehow on a specific memory layout, and doing
any sort of debugging (using an actual debugger, or adding
debugging code to try to track down the problem) changes the
memory layout enough that the problem no longer appears... You
can create a demo program that's artificially dependent on a
certain memory layout by just identifying what is different when
you're running in the debugger versus normally... As a trivial
example of something that's different, think of your parent PID
(ie: getppid())... Say you had something that somehow ended up
referencing your PPID, and doing something inappropriate and
unexpected because of its value (mainly, because it shouldn't be
depending on it in the first place like it is, which probably occurred
due to some other bug subtly trashing memory, and redirecting a
pointer somewhere it doesn't belong)... Now, when run normally,
the value of the PPID may cause it to just die, or do whatever
weird stuff you're seeing... However, when run under a debugger,
the PPID will be different (namely the PID of the debugger), and
so the behavior could very well be different... In fact, maybe that
particular value causes it to work seemingly normally, with no
bugs... It's a difficult artificial scenario to purposefully recreate,
but can you envision how it might happen, at least? (The fact that
it's so hard to purposefully recreate goes hand-in-hand with it being
so difficult to debug when it occurs accidentally, I think... ;-))
I once had such a bug because of a hardware problem. I received data
from an external sensor and processed it internally. The width was limited
to 6bit and the upper 2 bits were set to zero in hardware. Of course I had
optimized my code relying on that behaviour. But due to a damage in the
port area of the setup it could happen that the 8th bit was set into a 1
state (only happened when a heavy truck started at the trafficlights in
front of the building). The value was used in one case as offset to a
pointer accessing a byte array that then influenced the machine controller.
When the bit jumped, it took a value way past the defined area causing
the machine to go into an emergency halt. Now, when the error occured
first, I took the hardware back to the lab and run tests with equal values
monitoring my code. It run a whole week with no error whatsoever. We
installed it again for the field test. The error came again, always
unpredictable, there was just no pattern to work with. It took a while until
I stumbled over the fact that it occured when this truck passed by. So I
installed hardware monitoring equipment and then I could see that the 8th
bit changed value. It took me another day to pinpoint the exact spot. After
that I masced those bits out before working with the values. It cost me
over a month all in all and without the sheer luck I had it might have been
longer.
Since that day I make always sure to initialise variables and if I assume a
certain range of values I check those before working with them too.
RobSeace
04-08-2004, 02:12 PM
Heh. Great story... But, that sounds more like a mandelbug (http://catb.org/~esr/jargon/html/M/mandelbug.html)
than a heisenbug... ;-) Or, the story about "Magic" (http://catb.org/~esr/jargon/html/magic-story.html)... ;-) (And,
while we're discussing Jargon File terms and stories, my favorite
Jargon File story has to be The Story of Mel (http://catb.org/~esr/jargon/html/story-of-mel.html)... ;-))
That's a nice story. I was a bit like that back in 1982 and still wrote pure
machine code in 1992, when the story from the other posting happened.
Not quite as extreme, but times had changed after all. And I never even
considered to write such code on a PC platform as I just never liked the
x86 architecture. I miss it though :cry:
I wonder what Mel would say about Java... :twisted:
felix
04-13-2004, 02:58 PM
Hi RobSeace & Nop! :)
strllen()? With 2 "l"s?? Never heard of it... Neither has Google,
apparently... ;-) Describe what it does...
It does a funcion from familly of strl*() instead of strn*(), that do in general what strn*() family should make, like add a \0 at last byt of a buffer, doesn't compare or copy more than maxlen, etc. ;-)
Take a look in a intersting discussion:
http://forums.devshed.com/t69821/s.html?highlight=getting+chars+from+text+javascrip t
ps.: I could implement it, but i thinked that i exist under Linux, but it appears that exist only in BSD. But as it's C (ansi!?) i can copy it to my Linux box and use. ;-)
About the heisenbug it really is very hard to reproduce! :/
I tryed several kinds of vars without initialization, getpid() trickz, but it never fault only in debugger or disasember.
ps.: In disasember is more hard yet, because it only open the file and intercept the isntructions... o.O
If someone have ideas.. or codes.. heheeh
Thkz,
Regards.
RobSeace
04-13-2004, 07:46 PM
Well, I know about strlcpy(), strlcat() and those strl*() functions,
but I've just never heard of strllen()... And, honestly, I'm not sure
I understand what it's supposed to do that strnlen() doesn't do...
There's no null-termination (or modification of any kind) to be done
by a strlen()-like function: it's simply calculating an integer value
and returning it... So, what exactly does strllen() do that strnlen()
does not? (And, I believe strnlen() is a GNU extension, not standard
C... And, no the other BSD strl*() functions aren't standard C,
either... But, they probably should be...)
I think I had a Heisenbug a few months ago, however, I'm not sure if this really can be classified as such:
I developed a very small application to process CDRs (Call Data Records) in plain text format, to add fields that represented the operators for numbers A, B and C.
This application was good, it could process more than 1M records in less than 10 seconds, while the previous super-fast database implementation of the same process was taking about 2 minutes to process ~40K records.
After a few days of implementing it, the guy that was using it called me to tell me that he just started processing the files (we needed to process about 8 months of records at around 30M records/month) and that for some files the program was breaking...
I recompiled the program with the debug libraries (MS VC++) to try and run it using the debug environment, but this time the application didn't brak on the file... We tested it on several of the files that were breaking the application before, but the application finished correctly. However, the non-debug version always "core dumped" when we ran it on those files...
After a few minutes reviewing the output files of the crashed process, I found the problem and removed the bug. The new program ran perfectly, and didn't crash!!!
RobSeace
04-14-2004, 10:44 PM
Yep, that's a classic example of a Heisenbug: attempts to debug
it make the problem (or, at least the symptoms) go away...
What did the bug turn out to be, do you remember? Was it some
sort of buffer-overflow or other memory-trashing? Those tend to
often be the usual culprits...
Rob,
It was a very simple problem as you say. I was using an n-ary tree to process phone numbers, find the prefixes and related operators.
As the numbers were processed beginning from the first digit, each digit would select the corresponding tree branch...
To make implementation easy, I created a simple structure for each tree node, having an array of pointers, the array was fixed size because phone numbers only contain numeric digits (0..9).
The array was declared as: tree_node *children[10];
The code to select the children was like:
...
child = this_node->children[ number[cur_pos] - '0' ];
if (child != NULL) {
// Continue traversing tree
this_node = child;
pos++;
...
}
...
'number' is the name of the variable containing the phone number under analysis. 'cur_pos' is the index inside the string for the digit currently being visited (incremented each time a child node is found)...
I think you get the idea.
However, the CDRs sometimes contain some 'special' characters, and as you can see I was not checking for that, I was always expecting every character to be in the range of ASCII chars '0' and '9'.
When the number dialed was *123 the number coded in the CDR would be 'B123'... Imagine where that "pointer" could have taken the application!!!
Well, I'm glad to see I have had a Heisenbug!!!
felix
04-23-2004, 07:17 PM
Hi All!!
Sorry for delay to reply! ;-)
About the strllen() now thinking better i'm not sure what advantage it have against strnlen() since it doesn't touch at stack (fill with '\0'). Maybe the implementation check the sizeof(element)-1 and put a \0 !? Impossible ? Huuhehuehue
Anyway i'm searching the implementation of this file on web, since i don't have BSD here, if i find i will post the source here and see what is the difference. :)
About heisen bug, i had a little sucess, subscrabing only frame pointer in some cases, but it doesn't genrate core dumped, only generate some strange output sometimes at SOME debugers, anyway i'm trying yet... :)
Regards.
vBulletin® v3.7.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.