Given a continuous range -

This is very useful for sampling. For example, you have an array `random.sample`

to do this.

Here I will show some ways to generate such a sequence.

Let's consider the simplest algorithm. We can put the entire value range directly into an array and then shuffle it. If we take the first

Actually, there's an even simpler idea, but with a flaw: just randomly pick `std::vector`

Analyzing correctness: Assuming the current selection is the

The probability of selecting

It's the same naive idea as before: just generate without worrying about duplicates. What if we encounter duplicates? Don't worry about it! Remember how we calculate quadratic residues? We pick a number at random, and if it's a duplicate, we don't care. We can just pick another one. The only concern is that after many selections, we might run into duplicates again and again.

However, since we are picking numbers randomly, randomness is built in :) So we have to calculate the expectation.

First, let's solve a similar problem:

Perform a task. Each time there's a probability

This is just a geometric series! It becomes

Oops! The expected number of successes is the inverse of the probability of success! (Although it's a classic result and seems quite intuitive QwQ)

We want to find the complexity of randomly choosing a number that did not appear, so it suffices to compute an upper bound. Obviously, the maximum complexity occurs when selecting the

So if we choose

However, the above algorithm cannot satisfy all scenarios. For instance, when

Then, similar to the approach for some

Which leads to:

From this, we derive that

Of course, there's more room for optimization. For instance, if `std::sort`

for constant factor optimizations in time and space. This results in a time and space complexity of

Another optimization concerns the time complexity. Checking whether a number has appeared can be done not only with a BST but also with a hash table. This would improve the time complexity, but the constant factors can be very large. In this case, the complexity of the second method would become

Solving this, we get

Still related to checking whether a number has appeared. When `std::bitset`

can greatly optimize constants. When

In many cases, we don't require such randomness. Therefore, we can divide the value range equally into

While I was sleeping I thought of another method, method 3. I'm not sure if it's correct. So if you think it is wrong, please tell me in the comments. Specifically, we first randomly select

```
n/V Exp
0.0001 1
0.0002 1
0.0004 1
0.0008 1
0.0016 1
0.0032 1
0.0064 2
0.0128 2
0.0256 2
0.0512 3
0.1024 4
0.2048 6
0.4096 11
0.8192 47.7
```

It's evident that when *Method 2* to generate the rest, can slightly optimize the constant factors in time and space.

Attached is the code for generating the table: tester.cpp.

After I wrote the implementation of the method above, I realize that if I replace *Method 2* with *Method 3*, will it be faster?

It seems that in *Method 3*, the constant becomes too large when the ratio is inappropriate. But compared to the constant in *Method 2*, is this constant truly large? In addition, the ratio should be less than 0.5, so the constant reaches a maximum of 20. Is this really more optimal?

Indeed it is.

Here is the table using only *Method 3* (unsigned int): M3table.

The table above represents the performance of *Method 3* when fully utilizing `n/V=0.8`

group, so it looks good.

Running

It appears be slower in cases where the range of value is very small compared to the code above. However, we have a `std::bitset`

approach that strictly outperforms the hash approach for small value ranges. Combining these methods results in an optimal solution!

For *Method 3*, we can still optimize! We noticed that the generated numbers are ordered among themselves, only the newly generated numbers are unordered. So we just need to sort and merge the newly generated numbers. Less constant time, get it!

For *Optimization 3*, there's even more optimization. If we want to generate sequence many times, we can pretreat the bitset for better constants. That is, if you use `std::bitset::reset`

, it'll be faster than declaring one.

**Here's the final code: UNIG.cpp.**

Then we can check whether the program is correct using the program below for

```
#include<iostream>
#include<cstdio>
#include<cctype>
#include<vector>
#include<cstring>
#include<algorithm>
#include<ctime>
#include<random>
#include<cctype>
#include<iomanip>
using namespace std;
#define int long long
int x[10000005];
int cnt[10000001];
signed main() {
freopen("test.in","r",stdin);
int n,V=1e9;
cin>>n;
for(int i=1;i<=n;i++) cin>>x[i];
double avg=0;
double var=0;
cout.precision(10);
for(int i=1;i<=n;i++) avg+=x[i];
avg/=n;
cout<<"Average number: "<<avg<<endl;
for(int i=1;i<=n;i++) var+=(x[i]-avg)*(x[i]-avg);
cout<<"Variance value: "<<var/(n-1)<<endl;
return 0;
}
```

Result:

```
Average number: 499849719.6
Variance value: 8.33604178e+16
```

We know mathematically that the expected mean is

In this edition, I only write the important content. There's a Chinese edition of this article. Also, it's the former one. In this edition, there are more interesting things in it. But they are not important.

Thanks to ChatGPT and DeepL. They helped me to improve the English expression of this blog.