(→4. Simulation results) |
|||
(56 intermediate revisions by 2 users not shown) | |||
Line 32: | Line 32: | ||
[mailto:slobodan.bojanic@upm.es slobodan.bojanic@upm.es]</div> | [mailto:slobodan.bojanic@upm.es slobodan.bojanic@upm.es]</div> | ||
--> | --> | ||
− | |||
==Abstract== | ==Abstract== | ||
Line 53: | Line 52: | ||
==2. Problem statement== | ==2. Problem statement== | ||
− | Let the random number generator (RNG) generates integers from the given range [1 | + | Let the random number generator (RNG) generates integers from the given range <math>[1\cdots N] </math> (denoted as set <math>R</math>) long enough to guarantee that each number from <math>R</math> appears at least once. After RNG generates a number, the check is made if this number is already generated before. If not, it is used in application and some record of numbers generated so far is updated; otherwise, RNG generates a new number. The process is carried on until all numbers from <math>R</math> are generated and included into the record. |
− | For the sake of clarity of the following presentation and analysis, during the process of random generation we will maintain two sets. The record of numbers generated so far is kept as a set of integers (denoted as | + | For the sake of clarity of the following presentation and analysis, during the process of random generation we will maintain two sets. The record of numbers generated so far is kept as a set of integers (denoted as <math>I</math>). In order to save the space, it is implemented in an appropriate form of intervals of consecutive integers. These intervals of already used numbers will be referred as <math>I</math>–intervals. The remaining numbers from <math>R</math> are kept in a set of unused integers (denoted as <math>S</math>), also implemented in the form of intervals (<math>S</math>–intervals). In both set implementations, only lower and upper bound for each interval are recorded in memory. The interval can have only one element, referred to as ''single element interval'' (e.g., <math>[a, a]</math>), or more than one element, referred to as ''multiple element interval'' (e.g., [<math>a</math>, <math>b</math>], <math>a < b</math>). Since <math>S</math> and <math>I</math> sets are complements in respect to set <math>R</math> (i.e., <math> S+I=R</math>), in practical implementation it is sufficient to maintain only one of these sets for memory efficiency. [[#img-1|Figure 1]] illustrates alternating layout of <math>I</math>–intervals and <math>S</math>–intervals within the set <math>R</math>. Used numbers are denoted as circles and <math>I</math>–intervals are represented as shaded boxes, while unused numbers are denoted with squares and <math>S</math>–intervals are represented as white boxes. |
+ | <div id='img-1'></div> | ||
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | {| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | ||
|- | |- | ||
Line 65: | Line 65: | ||
− | Let we start with set of integers | + | Let we start with set of integers <math>S = \left\{x | 1 \leq x \leq N\right\}</math> (i.e., <math>S = R</math>) and an empty set <math>I = \O</math>. In each step, we randomly choose an element <math>x_i</math> from <math>S</math> and move it to <math>I</math>; i.e., <math>S=S-[x_i]</math> and <math>I=I+[x_i]</math>. Consequently, after <math>N</math> steps we will have <math>S = \O</math> and <math>I = R</math>. In each step, the record of numbers already chosen and included into <math>I</math> is updated in the following way: |
− | : | + | :A. In step <math>t = 1</math>, an element <math>x_1 \in S</math> is randomly chosen and included into <math>I</math>. Now, there is only one <math>I</math>-interval, [<math>x_1</math>, <math>x_1</math>]. |
− | : | + | :B. In step <math>t = i, 2 \le i \le N</math>, currently randomly chosen element <math>x_i \in S</math> is checked for adjacency to existing <math>I</math>–intervals with three possible outcomes: |
− | :1. if it is not adjacent to any of existing | + | :1. if it is not adjacent to any of existing <math>I</math>–intervals, a new <math>I</math>–interval <math>[x_i, x_i]</math> is created, |
− | :2. if it is adjacent to only one interval [ | + | :2. if it is adjacent to only one interval [<math>a, b]</math>, this <math>I</math>–interval is extended with <math>x_i</math> (e.g., if <math>x_i = a-1</math>, interval <math>[a, b]</math> is extended to <math>[x_i, b]</math> or if <math>x_i = b + 1</math>, interval <math>[a, b]</math> is extended to <math>[a, x_i]</math>), |
− | :3. if it is adjacent to two | + | :3. if it is adjacent to two <math>I</math>–intervals <math>[a, b]</math> and <math>[c, d]</math> in a way that <math>x_i = b + 1</math> and <math>x_i = c - 1</math>, these two intervals are merged into interval <math>[a, d]</math> (<math>x_i</math> previously corresponded to a single element <math>S</math>–interval). |
− | Obviously, after an element | + | Obviously, after an element <math>x_i</math> is included into <math>I</math>, three outcomes are possible in a step: |
− | :* number of | + | :* number of <math>I</math>–intervals is increased by 1 (cases A and B.1), |
− | :* number of | + | :* number of <math>I</math>–intervals stays the same (case B.2), |
− | :* number of | + | :* number of <math>I</math>–intervals is decreased by 1 (case B.3). |
− | In step | + | In step <math>t = 1</math>, the number of <math>I</math>–intervals is 1. Then, it increases until some maximum <math>M</math> is reached as new <math>I</math>–intervals are created. After reaching maximum, it is expected that <math>I</math>–intervals are progessively coalescing, and their number will fall until only one interval, [1, <math>N</math>], remains in step <math>t = N</math> (all elements from <math>S</math> are now moved to <math>I</math>, <math>S=\O</math> and <math>I= R</math>). |
− | For practical viability of this procedure, one of the crucial things is the memory needed for keeping the record of generated elements in set | + | For practical viability of this procedure, one of the crucial things is the memory needed for keeping the record of generated elements in set <math>I</math> (and also of unused elements in <math>S</math>). Memory requirements are directly proportional to <math>M</math> since <math>I</math> can be implemented as a dynamic structure in which only upper and lower bound of each interval are recorded, as it will be explained in Section 4. In the following sections, the maximum number of <math>I</math>–intervals <math>M</math> is determined by both analytical and simulation means. |
==3. Analytical solution== | ==3. Analytical solution== | ||
− | Let us assume that in step | + | Let us assume that in step <math>t = i</math>, <math>1 \leq i \leq N</math>, an element <math>x_i</math> is chosen from <math>S</math> and moved to <math>I</math>. After that, the number of elements in <math>S</math> is <math> N - i </math>. Let the number of <math>I</math>–intervals after adding <math>x_i</math> to <math>I</math> be denoted as <math>r_i</math>. In the next step <math>t = i + 1</math>, <math>i\neq N</math>, an element <math>x_{i+1}</math> is chosen from <math>S</math>. As previosly elaborated, its inclusion into <math>I</math> can lead to: |
− | :* decreased number of | + | :* decreased number of <math>I</math>–intervals - <math>r_i -1</math> (merging of two intervals), |
− | :* increased number of | + | :* increased number of <math>I</math>–intervals - <math>r_i +1</math> (creating a new interval), |
− | :* the same number of | + | :* the same number of <math>I</math>–intervals - <math>r_i</math> (extending an existing interval). |
− | Let the symbols | + | Let the symbols <math display="inline">{v}_{g}</math>, <math display="inline">{v}_{l}</math> and <math display="inline">{v}_{e}</math> stand for probabilities that the number of <math>I</math>–intervals will increase, decrease or stay the same, respectively. Also, the following equation must hold |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
|- | |- | ||
− | + | | <math>{v}_{g}+{v}_{l}+{v}_{e\, }=1</math> | |
|} | |} | ||
+ | In a given step, these probabilities depend on current positions of <math>I</math>–intervals of used numbers within range <math>R</math>. Three possible layouts of range <math>R</math> are shown in [[#img-2|Figure 2]]. <math>I</math>–intervals are sequenced from 1 to <math>r_i</math> and again represented by shaded boxes, while intervening <math>S</math>–intervals of unused numbers are represented by white boxes. | ||
− | + | <div id='img-2'></div> | |
− | + | ||
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | {| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | ||
|- | |- | ||
|style="padding:10px;"| [[Image:draft_Bojanic_923522951-image2-c.png|324px]] | |style="padding:10px;"| [[Image:draft_Bojanic_923522951-image2-c.png|324px]] | ||
|- | |- | ||
− | |style="font-size: 85%;"| 1) Layout | + | |style="font-size: 85%;"| 1) Layout <math>A</math> (only one range bound, 1 or <math>N</math>, belongs to <math>I</math>) |
|- | |- | ||
|style="padding:10px;"| [[Image:draft_Bojanic_923522951-image3-c.png|318px]] | |style="padding:10px;"| [[Image:draft_Bojanic_923522951-image3-c.png|318px]] | ||
|- | |- | ||
− | |style="font-size: 85%;"|2) Layout ''B'' (both range bounds, 1 and | + | |style="font-size: 85%;"|2) Layout ''B'' (both range bounds, 1 and <math>N</math>, belong to <math>I</math>) |
|- | |- | ||
|style="padding:10px;"| [[Image:draft_Bojanic_923522951-image4-c.png|324px]] | |style="padding:10px;"| [[Image:draft_Bojanic_923522951-image4-c.png|324px]] | ||
|- | |- | ||
− | |style="font-size: 85%;"|3) Layout | + | |style="font-size: 85%;"|3) Layout <math>C</math> (neither range bound belongs to <math>I</math>) |
|- style="text-align: center; font-size: 75%;" | |- style="text-align: center; font-size: 75%;" | ||
− | | colspan="1" style="padding:10px;"| '''Figure 2'''. Possible layouts of | + | | colspan="1" style="padding:10px;"| '''Figure 2'''. Possible layouts of <math>I</math>–intervals within range ''R'' |
|} | |} | ||
− | Let | + | Let <math>v_A(i)</math>, <math>v_B(i)</math> and <math>v_C(i)</math> denote the probabilities of layouts <math>A</math>, <math>B</math> and <math>C</math> after step <math>t = i</math>, respectively, and <math display="inline">{C}_{i}^{N}</math>is the number of <math>i</math>-combinations from set of <math>N</math> elements. Then, these probabilities can be calculated as: |
− | + | <div class="auto" style="text-align: center;"> | |
− | + | <math display="inline">{v}_{A}(i)=2\displaystyle\frac{{C}_{i-1}^{N-2}}{{C}_{i}^{N}}=2\displaystyle\frac{\left({}_{i-1}^{N-2}\right) }{\left({}_{i}^{N}\right) }=</math><math>2i\displaystyle\frac{N-i}{N(N-1)}</math> | |
− | + | ||
− | + | ||
− | + | <math display="inline">{v}_{B}(i)=\displaystyle\frac{{C}_{i-2}^{N-2}}{{C}_{i}^{N}}=\displaystyle\frac{\left({}_{i-2}^{N-2}\right) }{\left({}_{i}^{N}\right) }=</math><math>i\displaystyle\frac{i-1}{N(N-1)}</math> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | <math display="inline">{v}_{C}(i)=\displaystyle\frac{{C}_{i}^{N-2}}{{C}_{i}^{N}}=\frac{\left({}_{i}^{N-2}\right) }{\left({}_{i}^{N}\right) }=</math><math>\displaystyle\frac{(N-i)(N-i-1)}{N(N-1)}</math> | ||
+ | </div> | ||
− | We will now derive the expressions for probabilities | + | We will now derive the expressions for probabilities <math>v_l</math>, <math>v_g</math> and <math>v_e</math> in case of each particular layout. To this end, we define an important parameter denoted as <math>m_i</math>. It represents an average number of single element <math>S</math>–intervals after step <math>t = i</math>. This type of interval is important since its only element which belongs to <math>S</math> separates two consecutive <math>I</math>–intervals and its selection from <math>S</math> and inclusion into <math>I</math> in some later step makes that these two <math>I</math>–intervals are merged into one. |
− | Let us assume that in step | + | Let us assume that in step <math>t = i + 1</math> one of <math>N-i</math> elements from <math>S</math> is chosen. |
===Layout A=== | ===Layout A=== | ||
− | In case of this layout, the number of | + | In case of this layout, the number of <math>I</math>–intervals (<math>r_i</math>) is equal to the number of <math>S</math>–intervals (Figure 2a). The number of <math>I</math>–intervals can be decreased only if the chosen number belongs to some single element <math>S</math>–interval other than [1, 1] and <math>[N, N]</math>. The probability that single element <math>S</math>–interval lies at 1 or <math>N</math> is <math>m_i/r_i</math>, while the probability of the opposite case is 1 – <math>m_i/r_i</math>. Therefore, the probability of merging two consecutive <math>I</math>–intervals is |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 158: | Line 150: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(A)=\frac{{m}_{i}}{{r}_{i}}\frac{{m}_{i}-1}{N-i}+\frac{{r}_{i}-{m}_{i}}{{r}_{i}}\frac{{m}_{i}}{N-i}=</math><math>\frac{{m}_{i}}{N-i}\frac{{r}_{i}-1}{{r}_{i}}</math> | + | | <math display="inline">{v}_{l}(A)=\displaystyle\frac{{m}_{i}}{{r}_{i}}\displaystyle\frac{{m}_{i}-1}{N-i}+\frac{{r}_{i}-{m}_{i}}{{r}_{i}}\displaystyle\frac{{m}_{i}}{N-i}=</math><math>\displaystyle\frac{{m}_{i}}{N-i}\displaystyle\frac{{r}_{i}-1}{{r}_{i}}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (1) | | style="width: 5px;text-align: right;white-space: nowrap;" | (1) | ||
Line 164: | Line 156: | ||
− | The number of | + | The number of <math>I</math>–intervals can be increased only if the chosen number does not belong to some single element <math>S</math>–interval and also if it is not adjacent to some bound of any <math>I</math>–interval. The probability of such a case is |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 176: | Line 163: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(A)=1+\frac{{m}_{i}}{N-i}\frac{{r}_{i}-1}{{r}_{i}}+</math><math>\frac{1-2{r}_{i}}{N-i}</math> | + | | <math display="inline">{v}_{g}(A)=\displaystyle\frac{{m}_{i}}{{r}_{i}}\frac{N-i-{m}_{i}-2({r}_{i}-{m}_{i})}{N-i}+</math><math>\displaystyle\frac{{r}_{i}-{m}_{i}}{{r}_{i}}\displaystyle\frac{N-i-{m}_{i}-2({r}_{i}-{m}_{i}-1)-1}{N-i}</math> |
+ | |- | ||
+ | | <math display="inline">{v}_{g}(A)=1+\displaystyle\frac{{m}_{i}}{N-i}\displaystyle\frac{{r}_{i}-1}{{r}_{i}}+</math><math>\displaystyle\frac{1-2{r}_{i}}{N-i}</math> | ||
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (2) | | style="width: 5px;text-align: right;white-space: nowrap;" | (2) | ||
Line 182: | Line 171: | ||
− | The number of | + | The number of <math>I</math>–intervals stays the same if the chosen number is adjacent to a bound of some ''I''–interval. It can belong to some multiple element <math>S</math>–interval or to single element <math>S</math>–interval [1, 1] or <math>[N, N]</math>. The probability for such a case is |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 189: | Line 178: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{ | + | | <math display="inline">{v}_{e}(A)=\displaystyle\frac{{m}_{i}}{{r}_{i}}\frac{2({r}_{i}-{m}_{i})+1}{N-i}+</math><math>\frac{{r}_{i}-{m}_{i}}{{r}_{i}}\displaystyle\frac{2({r}_{i}-{m}_{i}-1)+1}{N-i}=\displaystyle\frac{2{m}_{i}(1-{r}_{i})}{N-i}+</math><math>\displaystyle\frac{{r}_{i}(2{r}_{i}-1)}{N-i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (3) | | style="width: 5px;text-align: right;white-space: nowrap;" | (3) | ||
Line 195: | Line 184: | ||
− | Using expressions (1) – (3), assuming layout | + | Using expressions (1)–(3), assuming layout <math>A</math> after step <math>t = i</math> an average number of <math>I</math>–intervals after step <math>t = i + 1</math> can be calculated as |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 202: | Line 191: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{r}_{i+1}(A)={v}_{}(A)({r}_{i}-1)+{v}_{ | + | |<math display="inline">{r}_{i+1}(A)={v}_{l}(A)({r}_{i}-1)+{v}_{e}(A){r}_{i}+{v}_{g}(A)({r}_{i}+</math><math>1)=\displaystyle\frac{N-i+1}{N-i}+\displaystyle\frac{N-i-2}{N-i}{r}_{i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (4) | | style="width: 5px;text-align: right;white-space: nowrap;" | (4) | ||
Line 209: | Line 198: | ||
===Layout B=== | ===Layout B=== | ||
− | For this layout, the number of | + | For this layout, the number of <math>I</math>–intervals is <math>r_i</math> while the number of <math>S</math>–intervals is <math>r_i</math> – 1 (Figure 2b). The number of <math>I</math>–intervals will be decreased only if the chosen number corresponds to some single element <math>S</math>–interval. The probability for such an event when two <math>I</math>–intervals are merged is |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 216: | Line 205: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(B)=\frac{{m}_{i}}{N-i}</math> | + | | <math display="inline">{v}_{l}(B)=\displaystyle\frac{{m}_{i}}{N-i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (5) | | style="width: 5px;text-align: right;white-space: nowrap;" | (5) | ||
Line 222: | Line 211: | ||
− | The number of | + | The number of <math>I</math>–intervals will be increased if the chosen number neither corresponds to any single element <math>S</math>–interval nor is adjacent to a bound of any <math>I</math>–interval. Then, a new <math>I</math>–interval is created with probability |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 229: | Line 218: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(B)=\frac{N-i-{m}_{i}-2({r}_{i}-1-{m}_{i})}{N-i}</math> | + | | <math display="inline">{v}_{g}(B)=\displaystyle\frac{N-i-{m}_{i}-2({r}_{i}-1-{m}_{i})}{N-i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (6) | | style="width: 5px;text-align: right;white-space: nowrap;" | (6) | ||
Line 235: | Line 224: | ||
− | The number of | + | The number of <math>I</math>–intervals does not change if chosen number corresponds to a bound of some multiple element <math>S</math>–interval. It can happen with probability |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 242: | Line 231: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{ | + | | <math display="inline">{v}_{e}(B)=\displaystyle\frac{2({r}_{i}-1-{m}_{i})}{N-i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (7) | | style="width: 5px;text-align: right;white-space: nowrap;" | (7) | ||
Line 248: | Line 237: | ||
− | Based on expressions (5) – (7), assuming layout | + | Based on expressions (5)–(7), assuming layout <math>B</math> after step <math>t = i</math> an average number of ''I''–intervals after step <math>t = i + 1</math> can be calculated as |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 255: | Line 244: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{r}_{i+1}(B)={v}_{}(B)({r}_{i}-1)+{v}_{ | + | | <math display="inline">{r}_{i+1}(B)={v}_{l}(B)({r}_{i}-1)+{v}_{e}(B){r}_{i}+{v}_{g}(B)({r}_{i}+</math><math>1)=\frac{N-i+2}{N-i}+\frac{N-i-2}{N-i}{r}_{i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (8) | | style="width: 5px;text-align: right;white-space: nowrap;" | (8) | ||
Line 262: | Line 251: | ||
===Layout C=== | ===Layout C=== | ||
− | For this layout, the number of | + | For this layout, the number of <math>I</math>–intervals is <math>r_i</math> while the number of <math>S</math>–intervals is <math>r_i +1</math> (Figure 2c). The number of <math>I</math>–intervals will be decreased if the chosen number corresponds to some single element <math>S</math>–interval other than [1, 1] and <math>[N, N]</math>. |
− | The probability | + | The probability <math>v_0</math> that no one single element <math>S</math>–interval lies on range bounds 1 and <math>N</math> is |
− | + | <div class="auto" style="text-align: center;"> | |
− | + | <math display="inline">{v}_{0}=\frac{\left({}_{{m}_{i}}^{{r}_{i}-1}\right) }{\left( {}_{{m}_{i}}^{{r}_{i}+1}\right) }=</math><math>({r}_{i}+1-{m}_{i})\frac{{r}_{i}-{m}_{i}}{{r}_{i}({r}_{i}+1)}</math></div> | |
− | + | ||
− | + | ||
− | |||
− | + | The probability <math>v_1</math> that one single element <math>S</math>–interval lies on a range bound (1 or <math>N</math>) is | |
− | + | ||
− | + | ||
− | + | ||
− | + | <div class="auto" style="text-align: center;"> | |
+ | <math display="inline">{v}_{1}=2\frac{\left({}_{{m}_{i}-1}^{{r}_{i}-1}\right) }{\left( {}_{{m}_{i}}^{{r}_{i}+1}\right) }=</math><math>2{m}_{i}\frac{{r}_{i}+1-{m}_{i}}{{r}_{i}({r}_{i}+1)}</math></div> | ||
− | |||
− | |||
− | |||
− | |||
− | Then, the probability that number of | + | The probability <math>v_2</math> that two single element <math>S</math>–intervals lie on range bounds 1 and <math>N</math> is |
+ | |||
+ | <div class="auto" style="text-align: center;"> | ||
+ | <math display="inline">{v}_{2}=\frac{\left({}_{{m}_{i}-2}^{{r}_{i}-1}\right) }{\left({}_ {{m}_{i}}^{{r}_{i}+1}\right) }=</math><math>{m}_{i}\frac{{m}_{i}-1}{{r}_{i}({r}_{i}+1)}</math></div> | ||
+ | |||
+ | |||
+ | Then, the probability that number of <math>I</math>–intervals will be decreased is | ||
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 292: | Line 278: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(C)={v}_{2}\frac{{m}_{i}-2}{N-i}+{v}_{1}\frac{{m}_{i}-1}{N-i}+</math><math>{v}_{0}\frac{{m}_{i}}{N-i}=\frac{{m}_{i}}{N-i}({v}_{2}+{v}_{1}+{v}_{0})-\frac{2{v}_{2}+{v}_{1}}{N-i}</math> | + | | <math display="inline">{v}_{l}(C)={v}_{2}\frac{{m}_{i}-2}{N-i}+{v}_{1}\frac{{m}_{i}-1}{N-i}+</math><math>{v}_{0}\frac{{m}_{i}}{N-i}=\frac{{m}_{i}}{N-i}({v}_{2}+{v}_{1}+{v}_{0})-\frac{2{v}_{2}+{v}_{1}}{N-i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (9) | | style="width: 5px;text-align: right;white-space: nowrap;" | (9) | ||
Line 298: | Line 284: | ||
− | The number of | + | The number of <math>I</math>–intervals will be increased if the chosen number neither corresponds to any single element <math>S</math>–interval nor is adjacent to a bound of any <math>I</math>–interval. The probability of such a case is: |
− | |||
− | |||
− | |||
− | |||
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 310: | Line 292: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(C)=\frac{N-i-{m}_{i}-2({r}_{i}-{m}_{i})}{N-i}({v}_{2}+</math><math>{v}_{1}+{v}_{0})-\frac{2{v}_{2}+{v}_{1}}{N-i}</math> | + | | <math display="inline">{v}_{g}(C)={v}_{2}\displaystyle\frac{N-i-{m}_{i}-2({r}_{i}+1-{m}_{i})}{N-i}+</math><math>{v}_{1}\frac{N-i-{m}_{i}-2({r}_{i}-{m}_{i})-1}{N-i}+{v}_{0}\frac{N-i-{m}_{i}-2({r}_{i}-{m}_{i})}{N-i}</math> |
+ | |- | ||
+ | | <math display="inline">{v}_{g}(C)=\displaystyle\frac{N-i-{m}_{i}-2({r}_{i}-{m}_{i})}{N-i}({v}_{2}+</math><math>{v}_{1}+{v}_{0})-\frac{2{v}_{2}+{v}_{1}}{N-i}</math> | ||
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (10) | | style="width: 5px;text-align: right;white-space: nowrap;" | (10) | ||
Line 316: | Line 300: | ||
− | The number of | + | The number of <math>I</math>–intervals stays the same if the chosen number corresponds to a bound of some multiple element <math>S</math>–interval or to a single element <math>S</math>–interval at any bound of range <math>R</math> (1 or <math>N</math>). It can happen with probability |
− | |||
− | |||
− | |||
− | |||
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 328: | Line 308: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{=}(C)=\frac{2({r}_{i}-{m}_{i})}{N-i}({v}_{2}+{v}_{1}+</math><math>{v}_{0})+2\frac{2{v}_{2}+{v}_{1}}{N-i}</math> | + | | <math display="inline">{v}_{e}(C)={v}_{2}\displaystyle\frac{2+2({r}_{i}+1-{m}_{i})}{N-i}+{v}_{1}\displaystyle\frac{2+2({r}_{i}+1-{m}_{i}-1)}{N-i}+</math><math>{v}_{0}\frac{2+2({r}_{i}+1-{m}_{i}-2)}{N-i}</math> |
+ | |- | ||
+ | |<math display="inline">{v}_{e}(C)=\displaystyle\frac{2({r}_{i}-{m}_{i})}{N-i}({v}_{2}+{v}_{1}+</math><math>{v}_{0})+2\frac{2{v}_{2}+{v}_{1}}{N-i}</math> | ||
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (11) | | style="width: 5px;text-align: right;white-space: nowrap;" | (11) | ||
|} | |} | ||
+ | |||
Since it can be calculated from above that | Since it can be calculated from above that | ||
− | + | <div class="auto" style="text-align: center;"> | |
− | + | <math display="inline">\displaystyle\frac{2{v}_{2}+{v}_{1}}{N-i}=\displaystyle\frac{2{m}_{i}}{({r}_{i}+1)(N-i)}</math></div> | |
− | + | ||
− | + | ||
and it holds that | and it holds that | ||
− | + | <div class="auto" style="text-align: center;"> | |
− | + | <math display="inline">{v}_{0}+{v}_{1}+{v}_{2}=1</math>,</div> | |
− | + | ||
− | + | ||
− | the expressions (9) | + | the expressions (9)–(11) become: |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 354: | Line 333: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(C)=\frac{{m}_{i}}{N-i}\frac{{r}_{i}-1}{{r}_{i}+1}</math> | + | | <math display="inline">{v}_{l}(C)=\displaystyle\frac{{m}_{i}}{N-i}\displaystyle\frac{{r}_{i}-1}{{r}_{i}+1}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (12) | | style="width: 5px;text-align: right;white-space: nowrap;" | (12) | ||
Line 364: | Line 343: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{}(C)=\frac{1}{N-i}(N-i+{m}_{i}-2{r}_{i}-\frac{2{m}_{i}}{{r}_{i}+1})</math> | + | | <math display="inline">{v}_{g}(C)=\displaystyle\frac{1}{N-i}(N-i+{m}_{i}-2{r}_{i}-\displaystyle\frac{2{m}_{i}}{{r}_{i}+1})</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (13) | | style="width: 5px;text-align: right;white-space: nowrap;" | (13) | ||
Line 374: | Line 353: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{v}_{ | + | | <math display="inline">{v}_{e}(C)=\displaystyle\frac{2}{N-i}({r}_{i}-{m}_{i}+\displaystyle\frac{2{m}_{i}}{{r}_{i}+1})</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (14) | | style="width: 5px;text-align: right;white-space: nowrap;" | (14) | ||
Line 380: | Line 359: | ||
− | From expressions (12) – (14), assuming layout | + | From expressions (12)–(14), assuming layout <math>C</math> after step <math>t = i</math>, the average number of ''I''–intervals after step <math>t = i + 1</math> can be derived as: |
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 387: | Line 366: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{r}_{i+1}(C)={v}_{}(C)({r}_{i}-1)+{v}_{ | + | | <math display="inline">{r}_{i+1}(C)={v}_{l}(C)({r}_{i}-1)+{v}_{e}(C){r}_{i}+{v}_{g}(C)({r}_{i}+</math><math>1)=1+\displaystyle\frac{N-i-2}{N-i}{r}_{i}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (15) | | style="width: 5px;text-align: right;white-space: nowrap;" | (15) | ||
Line 393: | Line 372: | ||
− | Expressions (4), (8) and (15) shows that an average number of | + | Expressions (4), (8) and (15) shows that an average number of <math>I</math>–intervals (<math>r_i +1</math>) does not depend on the number of single element <math>S</math>–intervals (<math>m_i</math>) for all three possible layouts <math>A</math>, <math>B</math> and <math>C</math>. |
− | Finally, an average number of intervals | + | Finally, an average number of intervals <math>r_i +1</math> after step <math>t = i+1</math> can be calculated as: |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
{| class="formulaSCP" style="width: 100%; text-align: center;" | {| class="formulaSCP" style="width: 100%; text-align: center;" | ||
Line 407: | Line 381: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | | + | | <math display="inline">{r}_{i+1}={v}_{A}(i){r}_{i+1}(A)+{v}_{B}(i){r}_{i+1}(B)+{v}_{C}(i){r}_{i+1}(C)</math> |
+ | |- | ||
+ | | <math display="inline">{r}_{i+1}=1+\displaystyle\frac{2i}{N(N-i)}+(1-\displaystyle\frac{2}{N-i}){r}_{i}\qquad </math> <math>1 \le i \le N-1, \qquad r_1 = 1</math> | ||
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (16) | | style="width: 5px;text-align: right;white-space: nowrap;" | (16) | ||
Line 419: | Line 395: | ||
{| style="text-align: center; margin:auto;" | {| style="text-align: center; margin:auto;" | ||
|- | |- | ||
− | | <math display="inline">{r}_{i+1}=(i+1)\frac{N-i}{N}</math> | + | | <math display="inline">{r}_{i+1}=(i+1)\displaystyle\frac{N-i}{N}</math> |
|} | |} | ||
| style="width: 5px;text-align: right;white-space: nowrap;" | (17) | | style="width: 5px;text-align: right;white-space: nowrap;" | (17) | ||
Line 425: | Line 401: | ||
− | Let <math display="inline">f(i)={r}_{i+1}</math>. The average number of | + | Let <math display="inline">f(i)={r}_{i+1}</math>. The average number of <math>I</math>–intervals as a function of current number of steps <math>i</math> is shown in [[#img-3|Figure 3]]. First and second derivation of this function are |
− | + | <div class="auto" style="text-align: center;"> | |
− | + | <math display="inline">{f}^{'}(i)=\displaystyle\frac{N-1}{N}-\displaystyle\frac{2i}{N}\quad</math> and <math display="inline">\quad {f}^{''}(i)=</math><math>-\frac{2}{N}</math>,</div> | |
− | + | ||
− | + | ||
− | |||
− | { | + | respectively. Since <math display="inline">{f}^{''}(i)\prec 0</math>, <math>f(i)</math> has a local maximum for <math display="inline">{i}_{0}=</math><math>\frac{N-1}{2}</math> (<math display="inline">{f}^{'}({i}_{0})=0</math>). The value of this local maximum is |
− | + | ||
− | + | <div class="auto" style="text-align: center;"> | |
− | + | <math display="inline">M=f(i_0)=\displaystyle\frac{(N+1)^{2}}{4N}</math>.</div> | |
+ | |||
− | For | + | For <math>N \gg 1</math> it gives <math>M\approx N/4</math> which represents the solution of the stated problem. |
+ | <div id='img-3'></div> | ||
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | {| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | ||
|- | |- | ||
|style="padding:10px;"| [[Image:draft_Bojanic_923522951-image5-c.png|366px]] | |style="padding:10px;"| [[Image:draft_Bojanic_923522951-image5-c.png|366px]] | ||
|- style="text-align: center; font-size: 75%;" | |- style="text-align: center; font-size: 75%;" | ||
− | | colspan="1" style="padding:10px;"| '''Figure 3'''. An average number of | + | | colspan="1" style="padding:10px;"| '''Figure 3'''. An average number of <math>I</math>–intervals |
|} | |} | ||
− | =4. Simulation results= | + | ==4. Simulation results== |
− | + | ||
− | + | ||
+ | Theoretical finding of the analytical approach conducted in the previuos section is also verified by means of software simulation. An appropriate program that simulates the procedure of random number generation elaborated in Section 2 is implemented. Random number generator from [11] is exploited in this simulation. For keeping the record of randomly numbers generated so far (implementation of set <math>I</math>) a structure called ''interval tree'' is used. It is a modification of standard binary search tree [12], whose nodes correspond to <math>I</math> –intervals instead of single values. An example of such a tree is given in [[#img-4|Figure 4]]. | ||
+ | <div id='img-4'></div> | ||
{| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | {| style="text-align: center; border: 1px solid #BBB; margin: 1em auto; width: auto;max-width: auto;" | ||
|- | |- | ||
Line 460: | Line 435: | ||
− | Each node has two integer fields for lower and upper bounds of the corresponding interval and two pointer fields to its left and right subtrees, so memory consumed is | + | Each node has two integer fields for lower and upper bounds of the corresponding interval and two pointer fields to its left and right subtrees, so memory consumed is <math>O</math>(<math>M</math>) where <math>M</math> is the maximum number of nodes (<math>I</math> –intervals). All intervals in the left subtree are lower, while the intervals in the right subtree are higher, so an efficient binary search is possible with average time complexity of <math>O</math>(log <math>M</math>) [12]. |
− | The experimental results are presented in Table 1. The size of random number range ( | + | The experimental results are presented in [[#tab-1|Table 1]]. The size of random number range (<math>N</math> – first column) is varied from 100 to 10000000. For each value of <math>N</math>, the results are averaged over 10 experiments. Theoretically expected number of <math>I</math> – intervals is calculated from analytical model as <math display="inline">M_a=\frac{({N+1})^{2}}{4N}</math>and given in the second column. The statistics about the maximum number of nodes (<math>I</math> – intervals) in the interval tree is collected (<math>M_s</math> – third column). Fourth column shows a relative difference between the results of analytical and simulation models calculated as <math>d_m = abs (( M_{a}-M_{s}) / M_{a}) </math>. As this difference decreases with <math>N</math> and become negligible, it is evident that the results of analytical model perfectly match the experimental results. Also, the number of generated values included into tree when the number of nodes reaches the maximum <math>M</math> is also kept (<math>P</math> – fifth column). Then, an average size of an <math>I</math> – interval is calculated as <math>l_{avg} = P/ M_s</math> (sixth column). |
− | + | ||
<div class="center" style="font-size: 75%;"> | <div class="center" style="font-size: 75%;"> | ||
'''Table 1'''. Experimental results</div> | '''Table 1'''. Experimental results</div> | ||
+ | <div id='tab-1'></div> | ||
{| style="margin: 1em auto 0.1em auto;border-collapse: collapse;font-size:85%;width:60%;" | {| style="margin: 1em auto 0.1em auto;border-collapse: collapse;font-size:85%;width:60%;" | ||
|- | |- | ||
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"| | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|<math>N</math> |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"| | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|<math>M_a</math> |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"| | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|<math>M_s</math> |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"| | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|<math>d_m</math> (%) |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"| | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|<math>P</math> |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"| | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|<math>l_{avg}</math> |
|- | |- | ||
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|100 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|100 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|25.5025 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|25.5025 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|27 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|27 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|0.058720 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|0.058720 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|47 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|47 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|1.74 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|1.74 |
|- | |- | ||
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|1000 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|1000 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|250.50025 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|250.50025 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|253 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|253 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|0.009979 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|0.009979 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|482 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|482 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|1.90 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|1.90 |
|- | |- | ||
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|10000 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|10000 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|2500.500025 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|2500.500025 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|2509 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|2509 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|0.003399 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|0.003399 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|4945 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|4945 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|1.97 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|1.97 |
|- | |- | ||
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|100000 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|100000 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|25000.5000025 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|25000.5000025 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|25020 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|25020 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|0.000780 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|0.000780 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|50060 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|50060 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|2.00 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|2.00 |
|- | |- | ||
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|1000000 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|1000000 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|250000.50000025 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|250000.50000025 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|250015 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|250015 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|0.000058 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|0.000058 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|500278 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|500278 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|2.00 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|2.00 |
|- | |- | ||
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|10000000 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|10000000 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|2500000.500000025 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|2500000.500000025 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|2494898 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|2494898 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|0.000041 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|0.000041 |
− | | style="border: 1pt solid black;text-align: right;vertical-align: top;"|5005260 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: right;vertical-align: top;"|5005260 |
− | | style="border: 1pt solid black;text-align: center;vertical-align: top;"|2.00 | + | | style="border-top: 1pt solid black;border-bottom: 1pt solid black;text-align: center;vertical-align: top;"|2.00 |
|} | |} | ||
− | |||
==5. Conclusion== | ==5. Conclusion== | ||
Line 533: | Line 507: | ||
==References== | ==References== | ||
− | [1] Knuth, D. The Art of | + | <div class="auto" style="text-align: left;width: auto; margin-left: auto; margin-right: auto;font-size: 85%;"> |
+ | |||
+ | [1] Knuth, D. The Art of computer programming. 3rd ed. Vol. 2: Seminumerical Algorithms, Addison Wesley, 1997. | ||
− | [2] Zaman S.U., Ghosh R. Review on | + | [2] Zaman S.U., Ghosh R. Review on fifteen statistical tests proposed by NIST. J. of Theor. Phys. and Cryptogr., 1:18-31, 2012. |
[3] Senn S.J., Rothman K.J., Carlin J.B., Poole C., Goodman S.N., Altman D.G. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur. J. of Epidemiol., 31:337-350, 2016. | [3] Senn S.J., Rothman K.J., Carlin J.B., Poole C., Goodman S.N., Altman D.G. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur. J. of Epidemiol., 31:337-350, 2016. | ||
− | [4] Doganaksoy A., Sulak F., Uguz M., Seker O., Akcengiz Y. New | + | [4] Doganaksoy A., Sulak F., Uguz M., Seker O., Akcengiz Y. New statistical ramdomness tests based on length of runs. Math. Probl. in Eng., 2015:14 pages, 2015. |
− | [5] Mohammad O.J., Abbas S., Horbaty E., Salem A. A | + | [5] Mohammad O.J., Abbas S., Horbaty E., Salem A. A new trend of pseudo random number generation using QKD. Int. J. of Comput. Appl., 96:13-17, 2014. |
− | [6] Liu C., Lai X. Evaluation of | + | [6] Liu C., Lai X. Evaluation of statistical tests for randomness using conditional entropy. Int. Conf. on Cyber-enabled Distrib. Comput. and Knowl. Discov., 504-509, 2013. |
− | [7] Hellman M. A | + | [7] Hellman M. A cryptanalytic time-memory trade-off. IEEE Trans. on Inf. Theory, 4:401-406, 1980. |
− | [8] Standaert F., Rouvroy G., Quisquater J., Legat J. A | + | [8] Standaert F., Rouvroy G., Quisquater J., Legat J. A time memory tradeoff using distinguished points; new analysis & FPGA results. Lect. Notes in Comput. Sci., 2523:593-609, 2003. |
− | [9] Oechslin P. Making a | + | [9] Oechslin P. Making a faster cryptanalytic time-memory trade-off. Lect. Notes in Comput. Sci., 2729:617-630, 2003. |
− | [10] Tomašević V., Tomašević M. An | + | [10] Tomašević V., Tomašević M. An analysis of chain characteristics in the cryptanalytic TMTO method. Theor. Comput. Sci., 501:52-61, 2013. |
− | [11] Matsumoto M., Nishimura T. Mersenne | + | [11] Matsumoto M., Nishimura T. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. on Model. and Comput. Simul. 8:3-30, 1998. |
− | [12] Cormen, T., Leiserson, C., Rivest, R., Stein, C. Introduction to | + | [12] Cormen, T., Leiserson, C., Rivest, R., Stein, C. Introduction to algorithms. Third ed. MIT Press, 2009. |
+ | </div> |
For some applications that use pseudorandom numbers it is essential to keep the record of numbers generated so far. Such a representative example is cryptanalytic TMTO approach. In order to save the space, instead of straightforward recording of individual numbers generated, an ordered tree-like data structure which tracks the intervals of generated numbers is proposed. For estimating the memory requirements of this structure as a function of pseudorandom numbers range size, an analytical probabilistic model is established and used. This model determines the maximum number of intervals during recording which corresponds to the tree size. The result obtained from analytical model is fully validated experimentally by means of simulation for a wide spectrum of range sizes.
Keywords: Pseudorandom numbers, interval trees, simulation analysis
Random events are immanent to numerous natural phenomena. Being fully unpredictable they can be modelled by random number sequences which do not follow any specific rules. Computer simulation of random processes requires computer-based generation of random numbers [1]. However, in practice such computer generated number sequences are not entirely random since detereministic algorithms are employed for this purpose. It means that for the same initial state these algorithms always produce the same sequences of numbers. This is the reason why the computer generated numbers are regarded as pseudorandom. Although true randomness can not be expected to achieve in a computer environment, quite often a sequence of pseudorandom numbers satisfies the most of the statistical testes of randomness [2-6]. Without a precise definition of relevant tests which would guarantee the reliability of obtained results, the implementation of pseudorandom generators is usually based on detailed mathematical analysis of their characteristics.
Pseudorandom numbers have a widespread application in various fields. The examples of their intensive use are the areas of statistics, gambling, probability and chance games, computer simulations, cryptography, etc. In some aplications, it is quite essential to constantly keep the record of already generated pseudorandom numbers.
An example that represents an immediate motivation for the analysis conducted in this paper is found in the area of cryptanalysis. This is a well-known Hellman TMTO algorithm which solves the main cryptanalytical problem of inverting one-way function in order to find the secret key [7]. This method, as well as its follow-up improvements [8,9], tries to achive a trade-off between time and memory requirements in this very computationally demanding task. An off-line, precomputation phase of the algorithm generates the fixed-length chains of keys (which can be regarded as pseudorandom), so that an on-line, attack phase can perform more efficient search. However, the main problem is that no record is kept during generation of chains about keys already included, so the multiple occurence of some keys in chains is inevitable. This leads to some irregular situations like looping and merging of chains. On the other hand, since the total number of keys in chains is equal to the size of the key space, multiple occurence of some keys prevents some other keys to be included into chains which incures a lower coverage of the key space. The consequence is that such keys can not be found at all, while the repeated key values are responsible for erroneous situations (false alarms). Therefore, the success of the attack is impaired, making the method probabilistic. The problems can be avoided by keeping the record of the randomly chosen keys which are already included into chains during off-line phase. In this way, resulted perfect chains avoid key repetion and attain full key space coverage, making the method deterministic. The relevant characteristics of perfect chains are examined in [10] by using probabilitic analytical and simulation model.
Pseudorandom number generator produces the numbers by a random choice from a given set, usually a range of numbers. If the range of possible numbers is quite large, which is an often case, the tracking of already generated numbers can be practically infeasible because of the extreme space demands. In order to decrease the memory required for tracking, instead of keeping each individual generated number, only the record of intervals of generated numbers can be kept. Consequently, an ordered dynamic tree-like structure is proposed where each node represents an interval of generated numbers with its lower and upper bounds. As the new numbers are generated and the tree updated accordingly, it is expected that number of nodes increases from one to some maximum number. Then, as new numbers are generated, the nodes are expeted to coalesce and the number of nodes decreases until all numbers are generated and the tree colapses to one node. The main goal of this paper is to determine the maximum number of nodes as a function of size of the generated numbers range.
A precise formal procedure of keeping the record during random number generation and the problem statement are given in Section 2. Then, problem is theoretically solved by an analytical approach which is described in Section 3. The result of this analysis is validated by means of simulation evaluation presented in Section 4. Finally, the conclusion is drawn in Section 5.
Let the random number generator (RNG) generates integers from the given range (denoted as set ) long enough to guarantee that each number from appears at least once. After RNG generates a number, the check is made if this number is already generated before. If not, it is used in application and some record of numbers generated so far is updated; otherwise, RNG generates a new number. The process is carried on until all numbers from are generated and included into the record.
For the sake of clarity of the following presentation and analysis, during the process of random generation we will maintain two sets. The record of numbers generated so far is kept as a set of integers (denoted as ). In order to save the space, it is implemented in an appropriate form of intervals of consecutive integers. These intervals of already used numbers will be referred as –intervals. The remaining numbers from are kept in a set of unused integers (denoted as ), also implemented in the form of intervals (–intervals). In both set implementations, only lower and upper bound for each interval are recorded in memory. The interval can have only one element, referred to as single element interval (e.g., ), or more than one element, referred to as multiple element interval (e.g., [, ], ). Since and sets are complements in respect to set (i.e., ), in practical implementation it is sufficient to maintain only one of these sets for memory efficiency. Figure 1 illustrates alternating layout of –intervals and –intervals within the set . Used numbers are denoted as circles and –intervals are represented as shaded boxes, while unused numbers are denoted with squares and –intervals are represented as white boxes.
Figure 1. Layout of I–intervals and S–intervals |
Let we start with set of integers (i.e., ) and an empty set . In each step, we randomly choose an element from and move it to ; i.e., and . Consequently, after steps we will have and . In each step, the record of numbers already chosen and included into is updated in the following way:
Obviously, after an element is included into , three outcomes are possible in a step:
In step , the number of –intervals is 1. Then, it increases until some maximum is reached as new –intervals are created. After reaching maximum, it is expected that –intervals are progessively coalescing, and their number will fall until only one interval, [1, ], remains in step (all elements from are now moved to , and ).
For practical viability of this procedure, one of the crucial things is the memory needed for keeping the record of generated elements in set (and also of unused elements in ). Memory requirements are directly proportional to since can be implemented as a dynamic structure in which only upper and lower bound of each interval are recorded, as it will be explained in Section 4. In the following sections, the maximum number of –intervals is determined by both analytical and simulation means.
Let us assume that in step , , an element is chosen from and moved to . After that, the number of elements in is . Let the number of –intervals after adding to be denoted as . In the next step , , an element is chosen from . As previosly elaborated, its inclusion into can lead to:
Let the symbols , and stand for probabilities that the number of –intervals will increase, decrease or stay the same, respectively. Also, the following equation must hold
In a given step, these probabilities depend on current positions of –intervals of used numbers within range . Three possible layouts of range are shown in Figure 2. –intervals are sequenced from 1 to and again represented by shaded boxes, while intervening –intervals of unused numbers are represented by white boxes.
Let , and denote the probabilities of layouts , and after step , respectively, and is the number of -combinations from set of elements. Then, these probabilities can be calculated as:
We will now derive the expressions for probabilities , and in case of each particular layout. To this end, we define an important parameter denoted as . It represents an average number of single element –intervals after step . This type of interval is important since its only element which belongs to separates two consecutive –intervals and its selection from and inclusion into in some later step makes that these two –intervals are merged into one.
Let us assume that in step one of elements from is chosen.
In case of this layout, the number of –intervals () is equal to the number of –intervals (Figure 2a). The number of –intervals can be decreased only if the chosen number belongs to some single element –interval other than [1, 1] and . The probability that single element –interval lies at 1 or is , while the probability of the opposite case is 1 – . Therefore, the probability of merging two consecutive –intervals is
|
(1) |
The number of –intervals can be increased only if the chosen number does not belong to some single element –interval and also if it is not adjacent to some bound of any –interval. The probability of such a case is
|
(2) |
The number of –intervals stays the same if the chosen number is adjacent to a bound of some I–interval. It can belong to some multiple element –interval or to single element –interval [1, 1] or . The probability for such a case is
|
(3) |
Using expressions (1)–(3), assuming layout after step an average number of –intervals after step can be calculated as
|
(4) |
For this layout, the number of –intervals is while the number of –intervals is – 1 (Figure 2b). The number of –intervals will be decreased only if the chosen number corresponds to some single element –interval. The probability for such an event when two –intervals are merged is
|
(5) |
The number of –intervals will be increased if the chosen number neither corresponds to any single element –interval nor is adjacent to a bound of any –interval. Then, a new –interval is created with probability
|
(6) |
The number of –intervals does not change if chosen number corresponds to a bound of some multiple element –interval. It can happen with probability
|
(7) |
Based on expressions (5)–(7), assuming layout after step an average number of I–intervals after step can be calculated as
|
(8) |
For this layout, the number of –intervals is while the number of –intervals is (Figure 2c). The number of –intervals will be decreased if the chosen number corresponds to some single element –interval other than [1, 1] and .
The probability that no one single element –interval lies on range bounds 1 and is
The probability that one single element –interval lies on a range bound (1 or ) is
The probability that two single element –intervals lie on range bounds 1 and is
Then, the probability that number of –intervals will be decreased is
|
(9) |
The number of –intervals will be increased if the chosen number neither corresponds to any single element –interval nor is adjacent to a bound of any –interval. The probability of such a case is:
|
(10) |
The number of –intervals stays the same if the chosen number corresponds to a bound of some multiple element –interval or to a single element –interval at any bound of range (1 or ). It can happen with probability
|
(11) |
Since it can be calculated from above that
and it holds that
the expressions (9)–(11) become:
|
(12) |
|
(13) |
|
(14) |
From expressions (12)–(14), assuming layout after step , the average number of I–intervals after step can be derived as:
|
(15) |
Expressions (4), (8) and (15) shows that an average number of –intervals () does not depend on the number of single element –intervals () for all three possible layouts , and .
Finally, an average number of intervals after step can be calculated as:
|
(16) |
Solving the recurrence from (16), we obtain:
|
(17) |
Let . The average number of –intervals as a function of current number of steps is shown in Figure 3. First and second derivation of this function are
respectively. Since , has a local maximum for (). The value of this local maximum is
For it gives which represents the solution of the stated problem.
Figure 3. An average number of –intervals |
Theoretical finding of the analytical approach conducted in the previuos section is also verified by means of software simulation. An appropriate program that simulates the procedure of random number generation elaborated in Section 2 is implemented. Random number generator from [11] is exploited in this simulation. For keeping the record of randomly numbers generated so far (implementation of set ) a structure called interval tree is used. It is a modification of standard binary search tree [12], whose nodes correspond to –intervals instead of single values. An example of such a tree is given in Figure 4.
Figure 4. An example of an interval tree |
Each node has two integer fields for lower and upper bounds of the corresponding interval and two pointer fields to its left and right subtrees, so memory consumed is () where is the maximum number of nodes ( –intervals). All intervals in the left subtree are lower, while the intervals in the right subtree are higher, so an efficient binary search is possible with average time complexity of (log ) [12].
The experimental results are presented in Table 1. The size of random number range ( – first column) is varied from 100 to 10000000. For each value of , the results are averaged over 10 experiments. Theoretically expected number of – intervals is calculated from analytical model as and given in the second column. The statistics about the maximum number of nodes ( – intervals) in the interval tree is collected ( – third column). Fourth column shows a relative difference between the results of analytical and simulation models calculated as . As this difference decreases with and become negligible, it is evident that the results of analytical model perfectly match the experimental results. Also, the number of generated values included into tree when the number of nodes reaches the maximum is also kept ( – fifth column). Then, an average size of an – interval is calculated as (sixth column).
(%) | |||||
100 | 25.5025 | 27 | 0.058720 | 47 | 1.74 |
1000 | 250.50025 | 253 | 0.009979 | 482 | 1.90 |
10000 | 2500.500025 | 2509 | 0.003399 | 4945 | 1.97 |
100000 | 25000.5000025 | 25020 | 0.000780 | 50060 | 2.00 |
1000000 | 250000.50000025 | 250015 | 0.000058 | 500278 | 2.00 |
10000000 | 2500000.500000025 | 2494898 | 0.000041 | 5005260 | 2.00 |
Random number generation is widely exploited in many applications. Not rarely it is required that each number from the given range should be chosen and used in application only once. In that case it is necessary to keep the record of already generated numbers. For very large range sizes the recording procedure can be extremely space demanding. The goal of this paper is to estimate the memory needed for this purpose.
Instead of straightforward recording of each individual number, an advanced data structure in form of an interval tree is employed. It allows for an efficient binary search, and each node represents an interval of already generated numbers keeping only upper and lower bound for each interval. Memory demands are directly proportional to maximum number of nodes in this tree. These demands are first estimated using an analytical probabilistic model. It was obtained that for a range size N the maximum number of intervals (or nodes in the interval tree) is N/4 approximately. This finding is proved by simulation means since the experimental results entirely verified the theoretical one. The fact that the memory savings are within a constant factor of linear memory complexity implies that such a recording is still practically infeasible for very large range sizes.
This work has been partially supported by the Serbian Ministry of Education and Science (the projects III 44006 and III 44009).
[1] Knuth, D. The Art of computer programming. 3rd ed. Vol. 2: Seminumerical Algorithms, Addison Wesley, 1997.
[2] Zaman S.U., Ghosh R. Review on fifteen statistical tests proposed by NIST. J. of Theor. Phys. and Cryptogr., 1:18-31, 2012.
[3] Senn S.J., Rothman K.J., Carlin J.B., Poole C., Goodman S.N., Altman D.G. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur. J. of Epidemiol., 31:337-350, 2016.
[4] Doganaksoy A., Sulak F., Uguz M., Seker O., Akcengiz Y. New statistical ramdomness tests based on length of runs. Math. Probl. in Eng., 2015:14 pages, 2015.
[5] Mohammad O.J., Abbas S., Horbaty E., Salem A. A new trend of pseudo random number generation using QKD. Int. J. of Comput. Appl., 96:13-17, 2014.
[6] Liu C., Lai X. Evaluation of statistical tests for randomness using conditional entropy. Int. Conf. on Cyber-enabled Distrib. Comput. and Knowl. Discov., 504-509, 2013.
[7] Hellman M. A cryptanalytic time-memory trade-off. IEEE Trans. on Inf. Theory, 4:401-406, 1980.
[8] Standaert F., Rouvroy G., Quisquater J., Legat J. A time memory tradeoff using distinguished points; new analysis & FPGA results. Lect. Notes in Comput. Sci., 2523:593-609, 2003.
[9] Oechslin P. Making a faster cryptanalytic time-memory trade-off. Lect. Notes in Comput. Sci., 2729:617-630, 2003.
[10] Tomašević V., Tomašević M. An analysis of chain characteristics in the cryptanalytic TMTO method. Theor. Comput. Sci., 501:52-61, 2013.
[11] Matsumoto M., Nishimura T. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. on Model. and Comput. Simul. 8:3-30, 1998.
[12] Cormen, T., Leiserson, C., Rivest, R., Stein, C. Introduction to algorithms. Third ed. MIT Press, 2009.
Published on 24/06/19
Accepted on 10/06/19
Submitted on 06/06/19
Volume 35, Issue 2, 2019
DOI: 10.23967/j.rimni.2019.06.003
Licence: CC BY-NC-SA license
Are you one of the authors of this document?