• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Programming |OT| C is better than C++! No, C++ is better than C

I'm curious, there's a reason for not having them?

Youu're implying that both can be useful, but not for the same objects?
The reason is syntactic convenience. Having to mark fields public sucks and I'd rather just declare something "data" and be done with it. :p But also slightly semantic... It would seem odd if both were present in the same class, you know?
 

Somnid

Member
The reason is syntactic convenience. Having to mark fields public sucks and I'd rather just declare something "data" and be done with it. :p But also slightly semantic... It would seem odd if both were present in the same class, you know?

Functional programmers agree with this, functions on data, no private state. In OO land it's because you're trying to abstract something away. Computed properties for example, maybe you don't expose "Radius" because it's an implementation for a circle but you do expose "Area".
 
The idea is interesting, and I'm impressed by the efficiency. Though I've hated it because of how difficult sometimes it is to deal with all the fights between Sun, Oracle, IBM, Microsoft and the like. I never liked how the execution chain is a slightly shadowy thing.

But for all the dislike (and more) I have with Java about the syntax and philosophy, and the installation/management issues of the JVM, there's languages I like and use that work on JVM. Like Scala (though I sometimes wish it they kept the .net comptability)



I should try it indeed... Thanks for reminding me. Though I should have said that I was talking mostly about IDE for beginners.


Will definitively look into it.

For all the things I do in Python these days, I'm still not convinced it's a great language for very large projects. The lack of any "static" type checking makes extensive tests a necessity, and it quickly explodes. It's not uncommon to spend more time writing tests than code, but in Python, I sometimes feels like it's a "quadratic" task :/
If you aren't already, use pytest. Unittest requires too much boilerplate and pytest has helpful features like parametrize to easily add a lot of tests for things like bad input testing.

Also, write yourself a type checking decorator. I wouldn't even need that if it weren't for the fact I'm forced to use 2.7. If you can use newer python in pretty sure there is optional type enforcement where you just annotate the method or function signatures.

I do agree that at a certain size, programming in python just doesn't make that much sense. I know python people are all about duck typing and extensive type checking isn't pythonic but screw that. I'm writing some infrastructure right now where it can't fall over. It's very important that it doesn't. We are unfortunately forced to use a scripting language for this application. Therefore there are areas where I am exhaustively type checking, non pythonic be damned.

It does suck to be writing tests that are given for free in statically typed languages though. It could be worse. Could be like perl where a comparison operator could change the underlying types of objects you're comparing. Just gross.
 

MikeRahl

Member
My work is sending me to the VS Live event in Orlando this year.

Does anyone here have any experience with these conventions? Anything specific I should be looking out for or tips to make the most of it?
 
The reason is syntactic convenience. Having to mark fields public sucks and I'd rather just declare something "data" and be done with it. :p But also slightly semantic... It would seem odd if both were present in the same class, you know?

So if it's "just data", is it accessible to external users of the class or not?
 

Koren

Member
If you aren't already, use pytest.
I do... The issue is rather the amount of tests you need to write since you need a perfect coverage even more in Python than in other languages...

Also, write yourself a type checking decorator. [...] I do agree that at a certain size, programming in python just doesn't make that much sense. I know python people are all about duck typing and extensive type checking isn't pythonic but screw that.
Well, that's exactly my problem, and I share your opinion.

It's even worse because since it's unpythonic, checking type is awful in Python (2.x) and better in Python 3k, but still far from perfect. Also, if you go the decorator or extensive type check route, not only you have even worse performances, but I've mostly seen ackward and verbose solutions, and there's a reason I stay away from Java ;)

It's also a runtime solution, when I like the compile-time check you get in compiled languages.

PEP-484 and Mypy is interesting (even if it feels strange at first), PEP-526 and 544 even more so, I just hope they thought the syntax well enough so that we don't have bad surprises down the road.

It could be worse. Could be like perl where a comparison operator could change the underlying types of objects you're comparing. Just gross.
Definitively... Still, with custom types, since basically all operators are syntaxic sugar for overloadable functions, you can't expect even types you think you know well to behave properly.

I mean, this was a nightmare for me a couple years ago (not this, but a bug related to this):
Code:
>>> x = numpy.uint64(2**55)

>>> x+1 == x
True

Yes, because x+1 is actually
(uint64)( ((double)x) + ((double) 1) )
not
x + ((uint64) 1)

Don't expect the documentation to talk about this, too ;) Well, that's Numpy documentation, in any case...

Is it just me or...
Code:
>>> help(numpy.ndarray.T)

Help on getset descriptor numpy.ndarray.T:

T
    Same as self.transpose(), except that self is returned if
    self.ndim < 2.

    [...]

>>> A = numpy.array([0.])

>>> A.ndim
1

>>> A.T is A
False

>>> id(A)
41728192

>>> id(A.T)
45959568
I'm dumb?
 

Antagon

Member
So if it's "just data", is it accessible to external users of the class or not?

Isn't that the whole idea of a data class?

Anyway, of the languages I use I like how Kotlin handles it. Public by default (although I'd prefer 'module' scope over public) and auto 'getter' / 'setter' use. Ie: If you declare this:

Code:
class Test (val field1: String, var field2: String)

You get a class Test with field1 with just a getter method and a field2 with a getter and setter. If you call

Code:
val test = new Test("a", "b")
val a = Test.field1

and later implement a custom getField1() method, it'll use that instead.
 
is there a linux/unix thread? I'm attempting to run google music manager from a headless server (allowing music files downloaded to a seedbox to be automatically uploaded to google play music, essentially) but I am coming up short and I'm not sure where to ask for help
 
Have you guys seen this? https://www.devrant.io/feed/

Functional programmers agree with this, functions on data, no private state. In OO land it's because you're trying to abstract something away. Computed properties for example, maybe you don't expose "Radius" because it's an implementation for a circle but you do expose "Area".
Disagree. ML languages love to hide implementations. There is nothing unique about functional languages that allow them to forget about invariants.

So if it's "just data", is it accessible to external users of the class or not?
You get both :p data's a bag of public fields, a class is entirely private. I imagine data should automatically implement copy constructors, equality...
 

peakish

Member
What is good practice when implementing a random number generator in C++? I'm looking at a pretty simple program (an Ising model implementation) and want to generate a ton of integers on the range [0, N) (uniformly distributed) as well as float's on the range [0, 1).

To pass around a single object I combine the two distributions in a class with a shared generation engine. I initialize this at the start of my program and then pass it around as a pointer.

Code:
#include <random>
class UniformRNG {
public:
    UniformRNG (const int seed, const int N) { ... } /* Init generator with seed, set distributions */
    const int gen_lattice_site() { return uniform_int(engine); }
    const float gen_real() { return uniform_real(engine); }
private:
    mt19937 engine; // Mersenne-Twister engine
    uniform_int_distribution<int> uniform_int;
    uniform_real_distribution<float> uniform_real;
};

[...]

// Init the generator
UniformRNG rng(seed, N);

[...]

// Use in function
int i = rng.gen_lattice_site();
int j = rng.gen_lattice_site();
My first, I guess pretty stupid question is, am I introducing a lot of overhead by wrapping the calls to the distributions in the class like this? I've compared to creating the generator (and distributions) as static's inside the function that uses them and the benchmarks don't show a difference, but am I missing some other obvious way to write this?

Second, is the <random> library supposed to be somewhat, eh, slow? I'm comparing the above STL implementation to the original implementation of the Mersenne-Twister engine and doing
Code:
init_genrand(seed);
[...]
// Use in function
auto i = (int) gen_real2() * L; // Or static_cast<int>(...)
auto j = (int) gen_real2() * L;
This more than halves my total program runtime since so much time is spent getting numbers (5.82s vs 2.35s, from multiple benchmarks). So this way of generating int's seems very expensive. Even if I generate them like
Code:
// in UniformRNG
const int gen_lattice_site() { return (int) uniform_real(engine) * L; }
I get 3.16s vs 2.35s, so while faster it still is quite a bit slower.

I get that using a straight implementation probably introduces less overhead than going through the STL but I'm surprised that it would make this big of a difference. Am I doing something wrong? (As a sidenote, using default_random_engine instead of mt13397 gives me a speedup but I'd like to know why the same engine slows down this much in my case).
 

peakish

Member
Are you compiling with full optimizations? What compiler? And what does your benchmark program look like?
This is with g++ (GCC) 7.1.1 and "-O3 -std=c++14".

My benchmark is simply using "time" in my shell. I realize that's very blunt and measures everything, not just the number generation. I was just curious when rewriting the code for a lab and use this as a general measure to not slow the program down a lot. The total time variance is +/- 0.01s for multiple runs.
 

Koren

Member
Different strategies in inlining? My first guess would have been overhead to make the PRNG thread-safe (it's a reason that makes PRNG slower in other languages) but I don't see anything about random being TS.

random is a huge step in the good direction to make generation better, but there's still plently of strange decisions... I've been surprised this summer to discover that random_device, described as "random number generator that produces non-deterministic random numbers" can actually produce... deterministic random numbers.

What the hell?

Edit: I understand that the goal of random_device is to provide a seed to the <random> engines (so in this situation, a deterministic is better than no value at all), but at the very least, the documentation is awful.
 

Ixian

Member
My work is sending me to the VS Live event in Orlando this year.

Does anyone here have any experience with these conventions? Anything specific I should be looking out for or tips to make the most of it?
My work sent myself and another employee to Microsoft Build this year, but I don't really have any super awesome tips for you unfortunately. XD MS Build was fun but a lot of the panels weren't directly related to my day-to-day work, that said I tried to attend panels about stuff I've heard of but haven't really worked with yet. Also, lots of Azure panels since we're planning on migrating soon.

I guess my suggestions would be:
- Have a schedule in mind but don't feel obligated to stick to it. Some of the panels I thought would be cool turned out to be duds so I would go find others.
- Socialize! Ask people what they do, what tools they work with, etc. There's so much stuff going on in tech that it's impossible to keep up on all of it, plus people may use things differently than you do.
 

Mr.Mike

Member
supervillain_plan.png
 
This is with g++ (GCC) 7.1.1 and "-O3 -std=c++14".

My benchmark is simply using "time" in my shell. I realize that's very blunt and measures everything, not just the number generation. I was just curious when rewriting the code for a lab and use this as a general measure to not slow the program down a lot. The total time variance is +/- 0.01s for multiple runs.

Maybe try disassembling the code and looking at that?
 

Koren

Member
Out of curiosity, I looked into it with gprof, and I have indeed a far slower random with respect to the original implementation.

But I noticed something strange: the MT engine from random was called about twice for each generated number (not EXACTLY twice, sometimes a bit more, sometimes a bit less), and there was a bit overhead in the test function, probably to merge the two results from the MT engine into a single int.

I checked, and int had 31 digits (just a precaution, I admit I never use "int" if there's any chance the length may have an issue in code, and haven't used 64bits systems until recently).

I tried int32_t for uniform_distribution, no change.

But uint32_t in uniform_distribution changed it: exactly the right number of calls to the MT engine, and the overhead basically disappeared. It was about as fast as the original implementation (random was still slower, but barely).

May you try to replace <int> by <uint32_t> in your class and see whether things change?


I'd like to know exactly how uniform work with a signed integer... Would you have any reference on that, Cpp_is_king (or anyone else)?
 
I'm not sure if this is the place to be asking questions (apologizes if not!), but here's my stackoverflow post:

https://stackoverflow.com/questions/45993330/c-using-scanf-on-strings
I've only just started using C, so forgive the stupid question :p
It's not a stupid question, buffer management and size querying is the source of endless nastiness in C APIs.

If these strings are null-terminated you can just reuse the same buffer and overwrite the last string on every iteration. So choose one buffer size that's "large enough" to handle any potential string and work from there.

Or, you can copy the whole file into a single buffer of bytes and iterate over it.
 

Kopite

Member
If anyone here has used the GAF live thread extension(where it auto loads newer pages without needing to refresh) I'd like to develop something like that for another forum I frequent, how would I go about doing this and where would I start.
 
If anyone here has used the GAF live thread extension(where it auto loads newer pages without needing to refresh) I'd like to develop something like that for another forum I frequent, how would I go about doing this and where would I start.

I don't really know anything about browser extensions but my understanding is that for something like that to work in the background you need to use long polling, this means constantly sending GET requests to the page every set interval (5 seconds, 1 second, 2 minutes etc...). Then you could use AJAX to manipulate the DOM in realtime without needing to refresh the page.

The other option could be to use websockets, although I'm not sure how viable/difficult that would be for the task.
 

Water

Member
Any reason? By fear of mixing the ideas of the two languages?
...
Still, I have a lot of students that learn Python and Caml at the same time, several of them being beginners. I won't say there's never an oddity (how many time I saw a complex Python function to make it tail recursive... when Python doesn't have TRC? :D) but it seems that they do fine.
I'm not afraid that students would mix up the two languages, I think it's just mostly a waste of time to use two different languages or environments at first. My philosophy is that core programming skill and intuition is learned through building and managing more and more complex systems, and anything else is initially a distraction that should be minimized: language syntax, libraries, tools etc. Getting decently fluent in one environment is a necessary investment to get over those distractions. Moving to another environment resets the situation and pulls the focus off the stuff that matters.

Learning more languages, at least those with different paradigms is obviously useful or even necessary for later growth. But it's much more efficient to do it later after the core skills are there and it's just a matter of adapting to what's different in the new environment.

I think several languages / environments is especially bad for the least gifted students. They get overwhelmed and bogged down in details. And when someone needs to sweat through a hard problem, it's easy for them to slip into avoidance behavior where they instead seek immediate sense of satisfaction from looking at and fiddling with new languages, language features, frameworks, tools, etc. A student able to understand and implement a pathfinding algorithm in one language can always pick up new tools and trivia when necessary, whereas there's no guarantee that someone who "knows" a bunch of languages can do anything useful with them.
 

D4Danger

Unconfirmed Member
If anyone here has used the GAF live thread extension(where it auto loads newer pages without needing to refresh) I'd like to develop something like that for another forum I frequent, how would I go about doing this and where would I start.

I made that extension. If you want to ask anything I can probably answer. The source code is on github if you didn't know.

I don't really know anything about browser extensions but my understanding is that for something like that to work in the background you need to use long polling, this means constantly sending GET requests to the page every set interval (5 seconds, 1 second, 2 minutes etc...). Then you could use AJAX to manipulate the DOM in realtime without needing to refresh the page.

that's basically it. It's really just a timer with a small UI control that sends a request and parses the response. It isn't that complicated. 90% of the code is all the UI stuff because I didn't want to use a lib like React to keep it small and in one file.
 
This is with g++ (GCC) 7.1.1 and "-O3 -std=c++14".

My benchmark is simply using "time" in my shell. I realize that's very blunt and measures everything, not just the number generation. I was just curious when rewriting the code for a lab and use this as a general measure to not slow the program down a lot. The total time variance is +/- 0.01s for multiple runs.

A great website is The Compiler Explorer. It's honestly a bit mindblowing. So anyway, I set it to GCC 7.1.1 and used your exact same command line options. Link here If you look at the assembly generated for each of the two functions, they are very similar.

Code:
randint():
        sub     rsp, 5016
        xor     ecx, ecx
        mov     edx, 1
        mov     QWORD PTR [rsp], 0
.L137:
        mov     rax, rcx
        shr     rax, 30
        xor     rax, rcx
        imul    rax, rax, 1812433253
        lea     ecx, [rax+rdx]
        mov     QWORD PTR [rsp+rdx*8], rcx
        add     rdx, 1
        cmp     rdx, 624
        jne     .L137
        mov     QWORD PTR [rsp+4992], 624
.L138:
        mov     rdi, rsp
        call    std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()
        cmp     rax, 2147483647
        ja      .L138
        add     rsp, 5016
        ret

Code:
randuint():
        sub     rsp, 5016
        xor     ecx, ecx
        mov     edx, 1
        mov     QWORD PTR [rsp], 0
.L35:
        mov     rax, rcx
        shr     rax, 30
        xor     rax, rcx
        imul    rax, rax, 1812433253
        lea     ecx, [rax+rdx]
        mov     QWORD PTR [rsp+rdx*8], rcx
        add     rdx, 1
        cmp     rdx, 624
        jne     .L35
        mov     rdi, rsp
        mov     QWORD PTR [rsp+4992], 624
        call    std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()
        add     rsp, 5016
        ret

The only real difference between these two pieces of code is that the integer version is checking for -1, and if it encounters -1 it tries again. In practice this is so rare that this is most certainly not what you're encountering. I also don't think that the existence of 1 additional branch should cause those kinds of differences in timing.

the important thing though is that they're both calling the unsigned long version of the engine. There is literally no difference in the underlying algorithm being used to generate the code.
 

Koren

Member
Doesn't he compare std::random to the original version of Mersenne Twister?

I wonder what are the two algorithm you decompiled, but both are related to std::random, no?
 
Doesn't he compare std::random to the original version of Mersenne Twister?

I wonder what are the two algorithm you decompiled, but both are related to std::random, no?

Just click the second link in my post, it shows you the original source code. I was using int32 and uint32 with a uniform_random_distribution, since you said you observed the same issue as the OP by comparing int32 to uint32
 

Koren

Member
Sorry... I miiossed the link.

Still, here his what I get with sint vs int... Take this quick, dirty and dumb example:
Code:
#include <random>
#include <stdio.h>

using namespace std;

mt19937 engine;

int test1() {
    int total;
  
    uniform_int_distribution<uint32_t> uniform_int;
    
    for(int i=0; i<50000000; ++i)
        total += uniform_int(engine);
    
    return total;
}

int test2() {
    int total;
  
    uniform_int_distribution<int32_t> uniform_int;
    
    for(int i=0; i<50000000; ++i)
        total += uniform_int(engine);
    
    return total;
}

int main(int argc, char* argv[]) {
    int s = 0;
  
    s += test1();
    s += test2();
    return s;
}

I compile this with -pg, and use gprof on it.

Now, here is the interesting part: test1 call the MT engine 50M times, but test2 call it a bit more than 100M times, even if both loops generate 50M random integers
Code:
                0.23    0.00 50000000/150006470     test1() [3]
                0.46    0.00 100006470/150006470     test2() [1]
[2]     37.4    0.70    0.00 150006470         std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()() [clone .constprop.7] [2]

Both call an engine configure with <unsigned long>. It seems that to create a signed integer, an average of 2 calls are needed, and I wonder why.

Or is it my own compiler? (Also gcc, but an old one)
 
Wow, so for some reason GCC is just terrible at this.

https://godbolt.org/g/Tp3t8B

GCC 7.2:

Code:
test1():
        push    rbp
        push    rbx
        mov     ebx, 50000000
        sub     rsp, 8
.L35:
        mov     edi, OFFSET FLAT:engine
        call    std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()
        add     ebp, eax
        sub     ebx, 1
        jne     .L35
        add     rsp, 8
        mov     eax, ebp
        pop     rbx
        pop     rbp
        ret

Code:
test2():
        push    r15
        push    r14
        movabs  rax, 9223372032559808512
        push    r13
        push    r12
        xor     edx, edx
        push    rbp
        push    rbx
        mov     ebx, 2147483647
        sub     rbx, rdx
        mov     r14d, 50000000
        sub     rsp, 104
        mov     QWORD PTR [rsp+80], rax
        mov     eax, 4294967294
        cmp     rbx, rax
        ja      .L139
.L208:
        add     rbx, 1
        add     rax, 1
        xor     edx, edx
        div     rbx
        imul    rbx, rax
        mov     rbp, rax
.L140:
        mov     edi, OFFSET FLAT:engine
        call    std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()
        cmp     rbx, rax
        jbe     .L140
        xor     edx, edx
        div     rbp
.L141:
        movsx   rdx, DWORD PTR [rsp+80]
        add     eax, edx
        add     r15d, eax
        sub     r14d, 1
        je      .L170
.L210:
        movsx   rbx, DWORD PTR [rsp+84]
        mov     eax, 4294967294
        sub     rbx, rdx
        cmp     rbx, rax
        jbe     .L208
.L139:
        mov     eax, 4294967295
        cmp     rbx, rax
        jne     .L209
        mov     edi, OFFSET FLAT:engine
        call    std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()
        movsx   rdx, DWORD PTR [rsp+80]
        add     eax, edx
        add     r15d, eax
        sub     r14d, 1
        jne     .L210
.L170:
        add     rsp, 104
        mov     eax, r15d
        pop     rbx
        pop     rbp
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        ret

WTF is this? Regardless, as you said, there's 2 calls in here. You'd probably need to look at the IR GCC is outputting to figure out how it comes up with this atrocious codegen. If I had to guess, I'd say it's related to the fact that signed overflow is undefined, and this somehow confuses its optimizer.

Compare to clang (just change the dropdown in the compiler explorer):

Code:
test1():                              # @test1()
        push    rbp
        push    r14
        push    rbx
        sub     rsp, 16
        movabs  rax, -4294967296
        mov     qword ptr [rsp + 8], rax
        mov     ebx, 50000000
        lea     r14, [rsp + 8]
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        mov     esi, engine
        mov     rdi, r14
        mov     rdx, r14
        call    unsigned int std::uniform_int_distribution<unsigned int>::operator()<std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> >(std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>&, std::uniform_int_distribution<unsigned int>::param_type const&)
        add     ebp, eax
        dec     ebx
        jne     .LBB0_1
        mov     eax, ebp
        add     rsp, 16
        pop     rbx
        pop     r14
        pop     rbp
        ret

Code:
test2():                              # @test2()
        push    rbp
        push    r14
        push    rbx
        sub     rsp, 16
        movabs  rax, 9223372032559808512
        mov     qword ptr [rsp + 8], rax
        mov     ebx, 50000000
        lea     r14, [rsp + 8]
.LBB1_1:                                # =>This Inner Loop Header: Depth=1
        mov     esi, engine
        mov     rdi, r14
        mov     rdx, r14
        call    int std::uniform_int_distribution<int>::operator()<std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> >(std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>&, std::uniform_int_distribution<int>::param_type const&)
        add     ebp, eax
        dec     ebx
        jne     .LBB1_1
        mov     eax, ebp
        add     rsp, 16
        pop     rbx
        pop     r14
        pop     rbp
        ret
 

Koren

Member
A great website is The Compiler Explorer. It's honestly a bit mindblowing.
Bookmarked, it's just great... Many thanks...

But wait...

Code:
.L138:
        mov     rdi, rsp
        call    std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()
        cmp     rax, 2147483647
        ja      .L138

This part jump at .L138 if the result is above 0x7FFFFFFF, no?

If the result is a 32 bits number, there's a 50/50 chance of jumping, no?

That would mean an average of i/2^i calls, which equals 2, exactly what I see...

Or did I miss something?
 
Actually it's not clear that clang is better than GCC here just because of the shorter code. GCC appears to be inlining std::uniform_int_distribution<int>::eek:perator() while clang is emitting a call to it. clang emits calls to uniform_int_distribution<uint32_t>::eek:perator() as a separate call. Probably makes more sense to look directly at the standard library implementation than the assembly at this point, it looks like the two implementations are just different (undoubtably to avoid signed overflow)
 

Koren

Member
I still understand how the signed version of the function can return a negative value if the function call the engine again if the value is above 0x7FFFFFFF (compared as unsigned, so basically a <0 test, no?)

??

And why do it to begin with?

Going
Code:
uniform_int_distribution<unsigned> uniform;

int k = reinterpret_cast<int>(uniform(engine));
gives you a 2x increase in speed?!
 
Yea so the libcxx implementation that clang is using looks like this (comments added):

Code:
template<class _IntType>
template<class _URNG>
typename uniform_int_distribution<_IntType>::result_type
uniform_int_distribution<_IntType>::operator()(_URNG& __g, const param_type& __p)
{
    // Always generate uint32 or uint64 as the underlying type, regardless of whether or
    // not the result type is signed.
    typedef typename conditional<sizeof(result_type) <= sizeof(uint32_t),
                                            uint32_t, uint64_t>::type _UIntType;

    // The width of the range that we're interested in is [max - min + 1], as an unsigned.
    const _UIntType _Rp = __p.b() - __p.a() + _UIntType(1);

    // If the range is only one number wide, just return it, there's no randomness.
    if (_Rp == 1)
        return __p.a();

    // What's the max number of digits for this type?
    const size_t _Dt = numeric_limits<_UIntType>::digits;

    // Get the thing that generates random bits for this unsigned type.
    typedef __independent_bits_engine<_URNG, _UIntType> _Eng;
    if (_Rp == 0)
        return static_cast<result_type>(_Eng(__g, _Dt)());

    size_t __w = _Dt - __clz(_Rp) - 1;
    if ((_Rp & (std::numeric_limits<_UIntType>::max() >> (_Dt - __w))) != 0)
        ++__w;
    _Eng __e(__g, __w);

    // Keep trying to get a random number until it's less than the width we're interested in.
    _UIntType __u;
    do
    {
        __u = __e();
    } while (__u >= _Rp);
    return static_cast<result_type>(__u + __p.a());
}

But why isn't just reinterpreting the bits as a signed number? For that we have to look at this param_type thing.

Code:
    class param_type
    {
        result_type __a_;
        result_type __b_;
    public:
        typedef uniform_int_distribution distribution_type;

        // Oh, by default it's [0, max)
        explicit param_type(result_type __a = 0,
                            result_type __b = numeric_limits<result_type>::max())
            : __a_(__a), __b_(__b) {}

        result_type a() const {return __a_;}
        result_type b() const {return __b_;}

        friend bool operator==(const param_type& __x, const param_type& __y)
            {return __x.__a_ == __y.__a_ && __x.__b_ == __y.__b_;}
        friend bool operator!=(const param_type& __x, const param_type& __y)
            {return !(__x == __y);}
    };

So yea, I guess this makes sense. using an unsigned it's trying to generate from [0, UINT_MAX), and for an int it's trying to generate from [0, INT_MAX)
 
Probably you can get the same speed increase by just including negative numbers in the range.

Code:
    int64_t total;
  
    uniform_int_distribution<int32_t> uniform_int(std::numeric_limits<int32_t>::min(), std::numeric_limits<int32_t>::max());
    
    for(int i=0; i<50000000; ++i)
        total += uniform_int(engine);
    
    return total;
 
I still understand how the signed version of the function can return a negative value if the function call the engine again if the value is above 0x7FFFFFFF (compared as unsigned, so basically a <0 test, no?)

??

And why do it to begin with?

Going
Code:
uniform_int_distribution<unsigned> uniform;

int k = reinterpret_cast<int>(uniform(engine));
gives you a 2x increase in speed?!

This won't compile. You can't reinterpret_cast from unsigned to signed
 

Koren

Member
This won't compile. You can't reinterpret_cast from unsigned to signed
Oups, sorry, I'm tired. Make it static_cast (I had something else in mind at first, using a reinterpret_cast<int &>, and I mixed them)

So yea, I guess this makes sense.
Somehow... Not the kind of thing you would expect at first, though. And there's still something I don't understand in the assembly with the ja jump. I'll try again when I'm less tired :)

Probably you can get the same speed increase by just including negative numbers in the range.
Just checked, it does...

What kind of things you have to do to avoid a 2x slowdown :D


Peakish > pretty sure that's your answer... but if you can confirm it solve your issue, it'll be welcome. Strange but enlightening issue.
 
Oups, sorry, I'm tired. Make it static_cast (I had something else in mind at first, using a reinterpret_cast<int &>, and I mixed them)n :D

static_cast is undefined behavior :(. Signed overflow

Somehow... Not the kind of thing you would expect at first, though. And there's still something I don't understand in the assembly with the ja jump. I'll try again when I'm less tired :)

The ja is to try again if it's not in [0, int_max)
 

Koren

Member
Water > Thanks for the long answer... I must say I agree on most points. I admit I was mostly playing the devil's advocate. For most people, I think it's suboptimal, but I don't think it's that bad.

I think several languages / environments is especially bad for the least gifted students. They get overwhelmed and bogged down in details.
I agree... I should have said that the students here that learn Python and ML almost at the same time are among the best students, they're not the norm.

Learning more languages, at least those with different paradigms is obviously useful or even necessary for later growth. But it's much more efficient to do it later
Most probably. Though I wonder what "later" is (probably heavily depends on the people involved and on the time they spent on learning)
 

Koren

Member
static_cast is undefined behavior :(. Signed overflow
(smacking the head on the desk noise)

I love how I did most lower-level stuff using simple casts (like (int)) in the past which were working perfectly well. Then, since I do C++x11, I'm doing less and less of those dirty things.

Make it (int)(x) then ;)

Yes, that's not nice, reinterpret_cast was indeed the good solution, but with the need to use memory.


(pretty sure that static_cast is working for basically all compilers that don't go the extra mile to produce really undefined behaviors on purpose, though ;) )

The ja is to try again if it's not in [0, int_max)
Yes, but how do you get negative random numbers? That's what I don't understand.

But seeing how I'm dumb tonight, that must be something obvious I missed.
 
(smacking the head on the desk noise)

I love how I did most lower-level stuff using simple casts (like (int)) in the past which were working perfectly well. Then, since I do C++x11, I'm doing less and less of those dirty things.

Make it (int)(x) then ;)

Yes, that's not nice, reinterpret_cast was indeed the good solution, but with the need to use memory.



Yes, but how do you get negative random numbers? That's what I don't understand.

But seeing how I'm dumb tonight, that must be something obvious I missed.

Well if you allow them in your range, then that ja condition will be different. The algorithm generates an *offset* to be used from the start of the range, so if your minimum value is negative, and the offset is smaller than the absolute value of the min, you'll get a negative number
 

Koren

Member
Well if you allow them in your range, then that ja condition will be different. The algorithm generates an *offset* to be used from the start of the range, so if your minimum value is negative, and the offset is smaller than the absolute value of the min, you'll get a negative number
The think is, I don't see where they add the offset, and if you want a number in the [-2^31 .. 2^31-1) range, you should create one in the [0 .. 2^32-1) range, no?

But don't bother, I'll take my time with the assembly tomorrow, after a good sleep :) I shouldn't try to understad half a dozen pages of assembly just with two dozen lines.
 
Top Bottom