Lesser known tricks, quirks and features of C

217 points by rramadass 2 days ago

fuhsnn 2 days ago

My recent favorite is glibc's hack to implement _Static_assert under C99: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...

It uses the constant expression to create a bitfield of size -1 when failed, and leaves the compiler to error on that as the intended assertion. The actual statement is an extern pointer to a function returning a pointer to an array which has sizeof the aforementioned bitfield struct as its size.

Another one encountered in Toybox is (0 || "foo") being a const expression that evaluates to 1. Apparently the string literal must have been soundly created in data section, so its pointer address is safely assumed to be non-zero.

lifthrasiir 2 days ago

You have missed one important thing: every passing assertion will define a single extern function pointer with the same signature, so multiple `_Static_assert` invocations can coexist in a single scope. An extern definition doesn't have to be a function pointer by the way, I guess it helped a linker to have an easier time when removing unused symbols.
- fuhsnn 2 days ago
  
  Oops too late to edit, that's really a function prototype. So it wouldn't take storage space or affect symbol unless the user naughtily calls the __Static_assert_function.

wolfspaw 2 days ago

Really liked the trick of defining the struct in the return part of the function.

Array pointers: Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.

Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

flexible array member: extremely useful, and now there are good compiler flags for ensuring correct flexible array member usage

X-Macro: nice, no-overhead enum to string name. Didn't know the trick

Combining default, named and positional arguments: Named-arguments/default-arg, C version xD. It would be cool if it was added to C language as a native feature, instead of having to do the struct hiding macro.

Comma operator: really useful, specially in macros

Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely useful, alternatives synonims of iso646.h are awesome, love using and/or instead of &&/||

Designated initializer: super awesome, could not use if you wanted C++ portability. Now C++ supports some part of it.

Compound literals: fantastic, but in C++ it will explode due to stack deallocation in the same line. C++ should fix this and allow the C idiom >/

Bit fields: nice for more control of structs layout

constant string concat: "MultiLine" String, C version xD

Ad hoc struct declaration in the return type of a function: didn't know this trick, "multi value" return, C version xD

Cosmopolitan-libc: incredible project. Already knew of it, its awesome to offer a binary that runs in all S.Os at the same time.

Evaluate sizeof at compile time by causing duplicate case error: ha, nice trick for debugging the size of anything.

WalterBright 2 days ago

> Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.
It's not just annoying, it's the major source of bugs in shipped code. A fix:
https://www.digitalmars.com/articles/C-biggest-mistake.html
- wolfspaw 2 days ago
  
  I agree wholeheartedly, I really liked your article and fix.
  (In fact, I already had your article bookmarked xD, and I’m familiar with and truly admire your work)
fuhsnn 2 days ago

>Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/
The first array size is actually always decayed to a pointer, supporting it in a compiler without analysis passes like TCC is just a matter of skipping the "static" token and the size.
jcelerier a day ago
> Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/
C++ does?
```
    void print(const int (&array)[5]) {
      for(size_t i = 0; i < size; ++i)
        std::cout << array[i] << " ";
    }
```
will fail at compile time if you pass it anything other than an int[5] array
- mananaysiempre a day ago
  
  Including an int[6] array, unlike with int[static 5]. This is usually not what you want.
  - jcelerier 9 hours ago
    
    > This is usually not what you want.
    very interesting comment considering I'm literally fighting with stupid languages with this kind of permissive rules right now, which definitely just create more bugs (for instance silently dropped values because an upstream API changed, added an element at the end of the list, you updated but since you get no error you now have to go through all the calls to check them one-by-one)
    
    mananaysiempre 2 hours ago
    
    Remember, in C you cannot use anything but a literal constant for the array size. My reference for how useful strict array-size matching can be under such circumstances is standard Pascal (as opposed to Modula-ish Pascals like Borland’s), and the answer there is that it more or less isn’t. Even in C, I’d expect at least some people would actually use things like int(*array)[5], given this syntax is valid even in C89, but in function signatures I’ve literally never encountered it.
    If the size could a (type-level) variable, that would be a very different proposition. But variables lead to expressions, expressions lead to functions, functions lead to suffering^W becoming a full-fledged dependently typed programming language—if not an Agda or an Idris then at least an ATS[1]. I’d welcome that, but as far as I can see the ergonomics are very much not there yet for any kind of low-level programming.
    [1] https://ats-lang.sourceforge.net/
  - skribanto a day ago
    
    I believe int[6] can still be passed to static 5 but I would have to double check
xeyownt 2 days ago

Pointer decay is not a mistake.
It is what allows to do int * p = arr, and looping on array element with p++.
Keeping array type you would jump beyond the last element at first iteration.
- kaba0 a day ago
  
  It is, an array and a pointer are different types. There could be ways to convert it to a pointer, but it shouldn’t happen at so many places, implicitly.
  - teo_zero a day ago
    
    Correct. A method to extract a pointer from an array already exists:
    int *p = &arr[0];
    The mistake is to allow this:
    int *p = arr;
    
    uecker a day ago
    
    And yet, coding styles do not prohibit it and there is no compiler that has a warning.

lifthrasiir 2 days ago

I hate I know all of them...

> Backslash line splicing

One reason a trigraph was gone is that `??/`, a trigraph spelling for `\`, also acted like `\` in this context.

> Using `&&` and `||` as conditionals

Not only this is uncommon, but chaining them is not always correct because `a && b || c` doesn't equal to `a ? b : c` when `b` evaluates to false.

> Compile time assumption checking using `enum`s

Please use `static_assert` already.

> Matching character classes with `sscanf()`

This can be combined with `*` to ignore certain characters. For example `%*[ \t]` will skip all horizontal whitespaces, unlike a plain whitespace which will also skip newlines.

> Detecting constant expressions

This ultimately comes from C's weird way to say a null pointer, which is defined as any constant expression which type is inferred to be pointer. So a non-constant expression can be distinguished by multiplying it with a known zero constant.

fsckboy 2 days ago

> `??/`, a trigraph spelling for `\`, also acted like `\` in this context.
OF COURSE it should do what \ does, otherwise you have no other way to get a \
the point of trigraphs is to allow characters to be entered that your character-set/terminal keyboard doesn't allow.
- lifthrasiir 2 days ago
  
  That's technically true, but it could have been designed much better if that was the real intention:
  1. There is no real reason that trigraphs should be expanded inside a comment. Preprocessors can't make any additional comment, so the comment should be scanned and discarded as soon as possible but trigraphs somehow precede that.
  2. And this very behavior of backslash should have been also deferred as much as possible. ISO C already has two sets of doubly-quoted literals `"asdf"`, where one is used for normal string literals and another is used for preprocessing because `#include "foo\bar.c"` should refer to the file name that contains a backslash, not a backspace (`\b`). Since `#include FILENAME` is also possible, such literals may appear anywhere in the preprocessing line! Therefore we already have to defer processing of some backslashes, so why should remaining backslashes be processed that early?
  In my ideal design, a backslash is either a part of tokens (`+\<newline>=`, `foo\<newline>bar` or `"asdf\n\<newline>fdsa"`) or a standalone token optionally followed with a newline (`\<newline>`). No backslashes within comments are significant, effectively solving the first point. These tokens are then turned into normal tokens or whitespaces respectively, so they remain transparent to the parser. The trigraph could then have been allowed to replace backslashes in such cases (e.g. `+??/<newline>=`) without affecting remaining cases like comments.
  For the record, later digraphs are more or less designed as such but they lack backslashes, even though ISO/IEC 646 still doesn't contain backslashes for all charsets. This hints that the inclusion of trigraphs or digraphs was more due to vendor complaints (known to be IBM) than actual concerns from users who wouldn't be able to type backslashes if it were true.
  - fsckboy 2 days ago
    
    >1. There is no real reason that trigraphs should be expanded inside a comment.
    trigraphs are 100% substitutes for unrepresentable characters. they absolutely positively ALWAYS should be replaced by the character. Pretend it takes place before the character even arrives inside the comment, because it does.
    it's very much like the #define/#include c-preprocessor step, it happens first, that's what keeps it clean, understandable, manageable. (Sure you can have more complex macro systems, but they are... complex, they can get very ugly)
    if you know how to process a unix shell commandline, you know that there are layers to it. Trigraphs are just like that. If you don't know how a unix shell commandline is processed, learn it, it's worth knowing.
    
    lifthrasiir 2 days ago
    
    I'm talking about why trigraphs had to behave in such way, not how. C and C++ have a concept of source character set and execution character set, which can diverge. Let's say trigraphs are indeed for unrepresentable characters, then in which character set are them unrepresentable? If the answer is for source, alternative spelling is sufficient and comments should ideally have no effect or users will be confused. If the answer is for execution, why do other characters have no equivalent?
    Also you should be aware that the macro expansion in C/C++ is not like a literal string replacement. `#define FOO bar` doesn't turn `BAREFOOT` into `BAREbarT` or `"OH FOO'S SAKE"` into `"OH bar's SAKE"`. (Some extremely old preprocessors did do so, by the way.) `#define FOO(x) FOO(x)` doesn't make `FOO(bar)` into an infinite recursion because `FOO` is prevented from expansion when `FOO` itself is being already expanded. There are certainly some layers, but they are not what you seem to think.
    
    fsckboy 2 days ago
    
    you want to be able to convert source code from one system to another and back again, and you want to rules to be simple so that everybody who writes such a coverter gets it right, and you also don't want to think about a zillion edge cases. If the trigraphs exist on the wrong side of the conversion, flag them. otherwise, it's a very simple process.
    I was not talking about how the preprocessor is implemented, I was talking about the layering. You keep wanting to mix layers because you think you know better; thar be dragons.
    
    lifthrasiir 2 days ago
    
    Layering is only valuable when that serves its goals well. I don't see any reason to have an additional layer in the language here. If you are thinking about a strict separation between preprocessor and parser, that is already known to be suboptimal in compilation performance decades ago. (As a related example, a traditional Unixy way to separate archiving and compression is also known to be inefficient; a combined compressing archiver is better in design.)
    
    poincaredisk a day ago
    
    I disagree with the downvotes here. C language "layers" are tricky to get right, source of footguns and a backdoor potential (especially the trigram that started this comment chain), and overall a bandaid invented when there were no better solutions (like modules, or Unicode). Trigrams are a weird archaic quirk of the C language (and no other modern language), and I'm glad to see them gone.
    And since we're thinking about layers, character encoding hacks should be entirely outside of a programming language responsibility. Now that would be a proper layering.

saagarjha 2 days ago

Mentioning %n without explaining that it is overwhelmingly used for exploits is a little reckless IMO.

_kst_ 2 days ago
Background: A %n format specifier in a printf call stores the number of characters written so far into a specified variable. For example:
```
    #include <stdio.h>
    int main(void) {
        int count;
        printf("%s%n\n", "hello, world", &count);
        printf("count = %d\n", count);
    }
```
The output is:
```
    hello, world
    count = 12
```
%n can be exploited to write data to an arbitrary memory location, but only if the format string is something other than a string literal.
%n can be exploited, but it's entirely possible to use it safely.
- lifthrasiir 2 days ago
  
  I think another problem exposed by %n was that you can't easily compose format strings. Sure, `printf(str)` where `str` is a user input would be easy to detect and can be automatically turned into `printf("%s", str)` with some macro hack, but `printf(fmt, ...)` where `fmt` is composed from multiple partial format strings would be harder to reason.
greiskul 2 days ago

I'm curious about this, didn't know about %n before. What are the common pitfalls and exploits using this enables?
- mananaysiempre 2 days ago
  
  You would expect a printf call with a user-controlled format string to be, at worst, an arbitrary read. Thanks to %n, it can be a write as well.
- lights0123 2 days ago
  
  If the user can control the formatting string, they can write to pointers stored on the stack. It's important to use printf("%s", str) instead of printf(str).
  - rep_lodsb 2 days ago
    
    Useless use of printf; what's wrong with "puts(str)"?
    
    shawn_w 2 days ago
    
    puts() adds a newline at the end. gcc will happily turn printf("%s\n", str) into puts(str), though.
    I've never tested to see if printf("%s", str) becomes the equivalent fputs(str, stdout)

coreyp_1 2 days ago

That's a nice list!

I've been digging into cross-platform (Windows and Linux) C for a while, and it has been fascinating. On top of that, I've been writing a JIT-ted scripting (templating) language, and the ABI differences (not just fastcall vs stdcall vs cdecl) are often not easy to find documentation about.

I've decided that if I ever get to teach a University class on C again, I wanted to cover some of these things that I feel are often left out, and this list is a helpful reference! Thanks!

rramadass 2 days ago

This is actually a pretty good list and that's why i submitted it to HN. The Chinese stratagem "Cast a Brick to attract Jade" is relevant here though i haven't yet seen much "Jade" from others :-) The author's presentation/explanation is also quite succinct and precise with references pointing to further details and thus the overall s/n ratio is very good. This is how tech stuff should be written (contrast with meandering articles with one technique being "explained" over five pages).
Knowing these sort of techniques is important because they force you to think in different ways to solve a problem which expands one's mental design space. C (and C++) is particularly important here since it is the common "lingua-franca" across all system/application software from servers to desktops to itty-bitty MCUs.
PS: Also see the book Fluent C by Christopher Preschern which while not dealing with "tricks" shows how to use C effectively using a pattern-like approach.

winocm a day ago

There’s also the use of typedef to help make function declarations.

Such as:

  typedef void fptr_t(int);
  fptr_t foo;

That would effectively declare a function with the prototype: `void foo(int)'. This pattern is used quite a bit in BSD kernels.

jonathrg 2 days ago

Multi character constants is one of the many things in C that would be nice to use if the language would just choose some well-defined behaviour for it. It doesn't really matter which.

mananaysiempre 2 days ago

Mainstream compilers agree on multicharacter literals being big endian; that is, 'AB' is usually 'A' << CHAR_BIT | 'B'. The exception is MSVC, which also works like that as long as you don't use character escapes, but if you do it emits some sort of illogical, undocumented mess that looks like an ancient implementation bug fossilized into a compatibility constraint.
- poincaredisk a day ago
  
  Your phrasing is a bit confusing. Multicharacter litersl being big endian (as you've defined it) means that it'll actually end up little-endian in memory (in little endian architectures, like x86). So 'ABCD' will end up as 'DCBA' in memory.
  Really interesting to hear about the escaping quirk, I need to test it.

johnklos 2 days ago

Not sure what happened:

404

File not found

The site configured at this address does not contain the requested file.

ericpruitt 2 days ago

I'm getting the same error. This appears to be the article in question: https://github.com/Jorenar/blog/blob/gh-pages/_posts/2023-02...

golergka 2 days ago

    switch (n % 2) {
        case 0:
            do {
                ++i;
        case 1:
                ++i;
            } while (--n > 0);

    }

Someone is really ought to record a "WAT" video about C.

mananaysiempre 2 days ago

The switch statement in C is not a very limited pattern match. The switch statement in C is a very ergonomic jump table. Do not think ML’s case-of with only integer literals for patterns; think FORTRAN’s computed GO TO with better syntax. And it will cease to be a WAT. (For a glimpse of the culture before pattern matching was in programmers’ collective consciousness, try the series on designing a CASE statement for Forth that ran for several issues of Forth Dimensions.)
- russellbeattie 2 days ago
  
  I don't think there's any confusion of how it works, it's the deep horror in discovering that it's possible in the first place, and a morbid curiosity of the chaos it could cause if abused.
  - mananaysiempre 2 days ago
    
    At least for me, the feelings you describe are characteristic of a footgun, not a WAT. A WAT is rather a desperate bewilderment as to who could ever design the thing that way and why, and for switch statements computed gotos are the answer to that question.
    As for the footgun issue, I mean, it could be one in theory, sure. But I don’t think I’ve ever seen it actually fired. And I can’t really appreciate the Javaesque “abuse” thinking—it is to some extent the job of the language designer to prevent the programmer from accidentally doing something bad, but I don’t see how it is their job to prevent a programmer from deliberately doing strange things, as long as the result looks appropriately strange as well.
    (There are reasons to dislike C’s switch statement, I just don’t think the potential for “abuse” is one.)
PhilipRoman 2 days ago

Just think of the "case" statements like any other label, despite the misleading indentation. Then it becomes perfectly natural to jump in the middle of a loop.
- rramadass 2 days ago
  
  Right; This is how i worked out "Duff's Device". Wikipedia has a very helpful "Simplified Explanation" - https://en.wikipedia.org/wiki/Duff%27s_device#Simplified_exp...
- rramadass a day ago
  
  Here is an example : https://news.ycombinator.com/item?id=41670144

rramadass a day ago

This is actually pretty useful in some usecases. One very good example is Simon Tatham's "Coroutines in C" (https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html) to resume execution in a function after the point it returned from in the earlier call.

The relevant example code is;

  int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: goto LABEL0;
        case 1: goto LABEL1;
    }
    LABEL0: /* start of function */
    for (i = 0; i < 10; i++) {
        state = 1; /* so we will come back to LABEL1 */
        return i;
        LABEL1:; /* resume control straight after the return */
    }
  }

becomes;

  int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1"*/
            return i;
            case 1:; /* resume control straight after the return */
        }
    }
   }

tom_ 2 days ago

This sort of thing is pretty handy sometimes. Don't forget you can have code (e.g., start of the loop) before any of the cases too!
lifthrasiir 2 days ago

Just turn every single IOCCC winning entry into a video, you now have a year's supply of contents.
agumonkey 2 days ago

I wonder if there's any other instance (in programming or else) of intersecting grammar constructs being accepted.
- 082349872349872 2 days ago
  
  control "structures" in forth, although in this case the notion of "grammar construct" is more in the head of the user than in the implementation...
  - agumonkey a day ago
    
    yeah that's too loose :D
pjmlp a day ago

C definitly belongs into the set of WAT languages.

guerrilla 2 days ago

These are great. Most posts I read with titles similar to this are just the authors revealing that they don't know C very well but this one included some interesting things. I didn't know compund literals were lvalues but if you think about executable formats, it makes a lot of sense.

rramadass a day ago

The references linked to are also a pretty good source of similar info.

ranger_danger 2 days ago

> quirks and features

Someone is a fan of Doug DeMuro.

randomdata 2 days ago

This... is the 1972 Riche C

38 2 days ago

    > int (*ap1)[10] = &arr;

Wow that's garbage syntax. With Go it would be

    var ap1 *[10]int = &arr

rramadass 2 days ago

I actually find the C syntax easier to read and understand.
- lifthrasiir 2 days ago
  How about the following then? I can read them but by no means they are intuitive.
  int *x[10]; // How is this different from `ap1` above? int (*f (void))[10];
  - eMSF 2 days ago
    
    Well, obviously it doesn't have parentheses. It's not like this is the only instance where adding parentheses affects the end result.
    You could write even more complex declarators (but don't have to), but that would not prove that some other syntax is inherently intuitive. Case in point, I cannot parse the Go syntax as I do not know Go.
    In my experience pointers to arrays are rather uncommon and I'm not sure that I've ever written a function returning one, having even less of a need for a pointer to such. (Thus out of all these, only your first example is somewhat common in practice.)
    
    lifthrasiir a day ago
    
    > I cannot parse the Go syntax as I do not know Go.
    Or you probably never even tried. You should be immediately able to parse it if I provide a hint that `*`, `&` and `[10]` mean roughly the same thing as C, because `*[10]int` has no reasonable reason to be parsed as an array of 10 copies of something. You can't do so in C.
    
    rramadass a day ago
    
    Right. These are just "old chestnuts" used to scare C noobs particularly in interviews. IIRC the K&R C book itself had a example program to convert C declarations to English and also there exists a utility program called "cdecl" to do the same.
    
    lifthrasiir a day ago
    
    Better to use that English explanation as a model of readable syntax.
    
    rramadass a day ago
    
    It is but you just have to know how to map it.
  - rramadass 2 days ago
    
    These are all simple (not necessarily intuitive) if you know how operator binding works in C, (using braces to highlight);
    int *x[10]; ---> { int* } x[10]; int (*f (void))[10] ---> int { { (* {f (void) } ) } [10]; }
    The point is that once you have had some practice you can work it out and Go's syntax is not necessarily much better.
    
    lifthrasiir a day ago
    
    Your original comment said it "is easier to read and understand", not "can be worked out after some practice". Of course it is not like you should inline every `typedef`s into a single mess of complicated types, but you never said that you believe only simplest types should be used and C syntax is easier for them.
    In any case, I think Go is a clear winner here because all logical types are consecutive tokens. For example `f` in my example is (as you correctly parsed) a pointer to a function that returns an array of 10 integers, but that return type is normally written `int [10]` or `int NAME[10]`, while here is written in two chunks `int` and `[10]` with a big parenthesis inside.
    
    rramadass a day ago
    
    Again, my using "is easier to read and understand" was w.r.t. the parent's claim w.r.t. Go's syntax. You understood it wrong to mean an absolute general case.
    You need practice for complicated things and that is what i was pointing out with "can be worked out after some practice" and not that everything trivial needed practice.
    "Go is a clear winner here" is your claim and not necessarily one that i agree with since as mentioned, knowing the binding rules and a little practice complicated declarations are not that big of a deal.
    
    lifthrasiir a day ago
    
    Agreed that it's not actually a big deal (hence "here"), but it does strengthen a point that the C syntax wasn't designed carefully after all. The current C type syntax was completely accidental and any reasonable design could have avoided that. If that was too late for some reason, one could have defined a new parallel syntax that solves this problem. In fact C++ did so via its new function declaration syntax `auto f(...) -> ...`. Guess why...
    
    rramadass a day ago
    
    > The current C type syntax was completely accidental and any reasonable design could have avoided that.
    Absolutely baseless claim.
    The C syntax and language is the product of a small group (not a committee) of smart people with the goals of syntactical brevity, close to machine architecture (PDP-7/11) and the design goal of building along the path of BCPL->B->C. Dennis Ritchie himself explains the rationale in his paper The Development of the C Language and so one does not need to make untenable assumptions. The enduring success of the language (even in the face of all the developments since then in Computer HW and PLT) is proof of the validity of its design goals. Its "Abstract Machine" is simple and there is no complicated Object Model with the syntax merely being a thin veneer over a sequence of bytes. Contrast it with most modern languages (which seem to be designed to solve world peace/hunger and everything in between) and C appears more and more relevant these days. C++ used judiciously without a lot of "new features" introduced by the standards committee (the bane of the language) takes it to the next level sweet spot.
    
    lifthrasiir a day ago
    
    Not exactly, read the exact paragraph in The Development of the C Language:
    > [...] In all these cases the declaration of a variable resembles its usage in an expression whose type is the one named at the head of the declaration.
    > The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of. The central notion I captured from Algol was a type structure based on atomic types (including structures), composed into arrays, pointers (references), and functions (procedures). Algol 68's concept of unions and casts also had an influence that appeared later.
    Algol 68 by the way had a sensible type syntax that is akin to Go's. Ritchie "explains" the syntax difference by having the same syntax for the type and the expression, but that's not exactly why. For example there are actually two expressions relevant here, `*` dereference and `&` reference. Why was the former used? Why couldn't `*` be made a postfix at this point if the same syntax was a big concern? Ritchie himself never fully elaborated on those points, and I think that's because it was never his primary concern at all.
    As you have noted, it is very important to realize that C did have their design goals (for which C did a very excellent job). On the other hand it would be very misleading to claim that C had some bigger visions even at that time and they validate its design! Ritchie and others were clearly smart but they didn't design C to be something magnificant, it was a rough product from various trade-offs they had to endure. So why this particular syntax? Well, because B already picked a prefix `*` which Ritchie didn't want to change at all, and he allowed it to infect all other aspects of the type syntax without much consideration. (Or more neutrally, Ritchie couldn't figure out any other option when he had to keep the B compatibility. But keep in mind that B have already changed that operator from BCPL.)
    Technically speaking it was somehow designed, but only by following the path of the least resistance, not a solid reasoning, hence my use of the word "accidental". There are many other examples, such as the logical `&` (etc.) being renamed to `&&` but the bitwise `&` keeping the original precedence order because nothing was done. To be fair to Ritchie though, it is not his fault, but rather a fault of the whole software community that was fixated on this ancient language designed for specific purposes way too long.
    
    rramadass a day ago
    
    The relevant paragraphs from Ritchie's paper are not just what you quoted (much of your comment is not relevant) but much more;
    For each object of such a composed type, there was already a way to mention the underlying object: index the array, call the function, use the indirection operator on the pointer. Analogical reasoning led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,
    int i, *pi, **ppi;
    declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation that i, pi, and ppi all yield an int type when used in an expression. Similarly,
    int f(), *f(), (*f)();
    declare a function returning an integer, a function returning a pointer to an integer, a pointer to a function returning an integer;
    int *api[10], (*pai)[10];
    declare an array of pointers to integers, and a pointer to an array of integers. In all these cases the declaration of a variable resembles its usage in an expression whose type is the one named at the head of the declaration.
    The above can be summarized as the following two points;
    1) "Declaration reflects Use" (from K&R C book)
    which leads us to the corollary,
    2) Syntax is variable-centric rather than type-centric i.e. You look at how the variable is supposed to be used and then work out its type.
    > To be fair to Ritchie though, it is not his fault, but rather a fault of the whole software community that was fixated on this ancient language designed for specific purposes way too long.
    Again, this is completely baseless and merely your opinion. I have already pointed out the main design goals which drove its design and the fact that it is still relevant today is proof of that. The "simplicity" of its very design is its greatest strength. The fact that modern needs (eg. Strong Type Safety, Multi-paradigm, Security, Concurrency etc.) require us to design/seek out more language features is not a reflection on the language itself since it was designed well before these "wants" became "needs". On the other hand the various extensions of C (eg. Objective-C, Concurrent-C, Handel-C etc.) are proof of its versatility and extensibility and hence its enduring relevance.
    
    38 a day ago
    
    > C appears more and more relevant these days
    C is not relevant any more, not sure what world you are living in. it only has any relevance because it was the best option at the time decades ago, and so people are forced to use it when making syscalls. thats it.
    
    rramadass 17 hours ago
    
    What? There are more embedded devices than ever running C/C++ code today. All OSes, System utils etc. are still done in C/C++. All higher level performance oriented frameworks/libraries in any domain (eg. AI/ML) are implemented in C/C++ and then a interface to them are given through wrappers in other languages. Also C is the common "lingua-franca" across languages.
    C is still in the top five in the TIOBE index today.
    
    38 16 hours ago
    
    > All OSes, System utils etc. are still done in C/C++.
    first of all, no. plenty of OS are made in other languages. also, the big OS WERE written in C, and only remain so in order to avoid redoing millions of lines of code.
    > and then a interface to them are given through wrappers in other languages
    again this is only done because the OS is using an outdated language, so people are forced to work with it.
    > C is still in the top five in the TIOBE index today
    that doesnt matter, this does:
    https://madnight.github.io/githut/#/pull_requests/2024/1
    
    rramadass 13 hours ago
    
    You seem to be living in your own world and not willing to face Reality.
    First of all, TIOBE considers data from multiple sources to come up with its ranking. Github by itself is not enough; there are orders of magnitude more code outside of it and hence your assumption is wrong. Also most C/C++ folks prefer to keep code local (proprietary and personal reasons) and hence are not sampled. You can only get an idea indirectly by volume of objects/software shipped/used containing their C/C++ code.
    > plenty of OS are made in other languages.
    No. All of them are "experimental" and i don't know of any that are production worthy. Also writing an OS as part of some study project in $FAV_LANG is not tenable here.
    > again this is only done because the OS is using an outdated language, so people are forced to work with it.
    You have not understood what i wrote at all. I mentioned "performance oriented frameworks/libraries" which have nothing to do with OS but just domain specific code eg. AI/ML, Gaming etc. The OS interface itself is just a very small part of it. But their domain logic are all implemented in C/C++ with thin wrappers for other languages.
    Summarizing; C/C++ are relevant as ever today for all systems. To drive the point even more strongly C and C++ occupy two spots in the top five TIOBE index.
    
    38 8 hours ago
    
    > Github by itself is not enough; there are orders of magnitude more code outside of it and hence your assumption is wrong. Also most C/C++ folks prefer to keep code local (proprietary and personal reasons) and hence are not sampled.
    "you're wrong man, C totally has a bunch of code being used thats private, I swear". you could say that about every single other language. only thing that matter is what can be measured. C is dead man, you are just in denial. its an old crap language that hasn't been relevant in at least a decade. if you need some evidence, just look to the fact that after decades it still doesn't have a package manager, so many people laughably just vendor code when working with C projects.
pavlov 2 days ago

Maybe you missed the part where this is C, you know, the language designed by many of the same people as Go but 35 years earlier.
It would be a time warp worthy of the Rocky Horror Picture Show if C's design could take syntax ideas from Go.

o11c 2 days ago

Bah, those are all well-known.

What value does the following program return?

    int main()
    {
        int *p = 0;

    loop:
        if (p)
            return *p;

        int v = 1;
        p = &v;
        v = 2;
        goto loop;
        return 3;
    }

Also, rather than doing `sizeof` via one error at a time, it's better to just emit them to a char array {'0' + sz/10, '0' + sz%10, '\0'}. Generalizing this to signed numbers of arbitrary size is left as an exercise for the reader.

_kst_ 2 days ago

It returns 2.
The only reason that might be surprising is that the "return *p;" statement refers to the value of an object at a point (textually) before its definition. But the lifetime of the object named "v" begins on entry to the innermost compound statement enclosing its definition -- in this case the body of "main".
Space for "v" is allocated on entry to "main". It's initialized to 1 when its definition is reached. The "return *p;" statement appears before the definition of "v" in the program source, but is executed after its definition was reached at run time, and within its lifetime.
It's important to remember that scope and lifetime are two different things. The scope of an identifier is the region of program text in which the identifier is visible; for "v" it extends from the definition to the closing "}". The lifetime of an object is the time span during execution in which it exists; for "v" it extends from the time when execution reaches the opening "{" to the time when execution reaches the closing "}". Formally, storage for "n" is allocated at the beginning of its lifetime and deallocated at the end of its lifetime. Compilers can and do optimize allocation and deallocation, as long as the visible behavior is consistent.
Aside: If "v" were a VLA (variable length array, introduced in C99, made optional in C11) its lifetime would begin when execution reaches its definition.
- shultays 2 days ago
  
  Can't it reuse v's memory for other things before v is defined? Say there is "int a = 4;" at the beginning of main that is no longer used when it reaches "int v = 1;", can't a & v share same memory location?
  - _kst_ 2 days ago
    
    A compiler can reuse memory as much as it likes -- but only if the visible behavior of the program is consistent with the language requirements.
    If you write:
    { int n = 42; printf("%d\n", n); }
    in the abstract machine, `sizeof (int)` bytes are allocated on entry to the block and deallocated on exit, but a compiler can legally replace the entire block with `puts("42")` and not allocate any memory for `n`.
    Memory for objects defined in nested blocks is logically allocated on entry to the block, but compilers commonly merge the allocation into the function entry code. Even so, objects in parallel blocks can certainly share memory:
    int main(void) { { int a; } { int b; // might share memory with a } }
    Logically, memory for `a` is allocated on entry to the first inner block, and memory for `b` on entry to the second inner block. Compilers will typically allocate all the memory on entry to `main`, but can use the same address for `a` and `b`.
  - mananaysiempre 2 days ago
    
    As written, without introducing VLAs or additional blocks, no. C23 §6.2.4(5–6):
    > An object whose identifier is declared with no linkage and without the storage-class specifier `static` has automatic storage duration [...].
    > For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial representation of the object is indeterminate. If an initialization is specified for the object and it is not specified with `constexpr`, it is performed each time the declaration or compound literal is reached in the execution of the block [...]; otherwise, the representation of the object becomes indeterminate each time the declaration is reached.
    That is, a local variable is live from the moment the block that contains its declaration is entered (however and wherever that happens) until it is left (ditto), but is initialized or, for lack of a better word, uninitialized each time execution passes that declaration (however many times that happens, including none). This is despite the fact that at compile time the variable’s name is not in scope until the = introducing its initializer (or the place where such a = would go if there isn’t one). Modulo its smaller feature set, C89 §6.1.2.4(3) stipulates the same.
    In addition to GGP’s deliberately confusing example, this permits the much more reasonable and C89-compatible
    switch (x) { int i, j; case 1: /* use i and j */ break; case 2: /* use i and j */ break; }
    The only exception is locals of variably modified type (e.g. variable-length arrays), whose declarations you can’t jump over on pain of undefined behaviour.
    No wonder basically every C compiler allocates a single stack frame at function entry.
sweeter 2 days ago

Is it 2? I'm not exactly sure though. I'm interested in hearing the logic
- _kst_ 2 days ago
  
  See my comment above:
  https://news.ycombinator.com/item?id=41664474
- tylerhou 2 days ago
  
  gcc, msvc, and clang both produce code that exits with code 2: https://godbolt.org/z/WEYjns85Y