Because of constraints in exact modeling, measuring and computing, it is inevitable that algorithms that solve real world problems have to avoid errors. Hence, proposing models to handle error, and designing algorithms that work well in practice, are challenging fields. In this paper, we introduce a model called the -geometry model to handle a dynamic form of imprecision, which allows the precision to change monotonically in the input data of geometric algorithms. -geometry is a generalization of region-based models and provides the output of problems as functions, with respect to the level of precision. This type of output helps to design exact algorithms and is also useful in decision making processes. Furthermore, we study the problem of orthogonal range searching in one and two dimensional space under the model of -geometry, and propose efficient algorithms to solve it.
Imprecise data ; Modeling ; Dynamic imprecision ; Computational geometry ; Range searching problem
It is inevitable that algorithms that solve real world problems have to avoid errors. First, there is error in gathering data because of the limitations of measuring tools and observing methods, like sensor inaccuracies. Also, problem modeling, data processing and, finally, implementing a designed algorithm, are not free of error because of the properties of the real world, which is a non-Euclidean space, and the limitation of computations like finite precision and rounding errors. Accordingly, significant studies have focused on modeling and handling imprecision, and different approaches, such as probability theory [1] , fuzzy set theory [2] and rough set theory [3] , have been presented to deal with uncertainty and imprecision. Beside these approaches, computational geometry has been used also for modeling and handling uncertainty and imprecision [4] , [5] and [6] . Computational geometry has applications in designing algorithms for real problems, such as geographic information systems [7] , planning and robotics [8] , facility locations [9] , mechanical design [10] and pattern processing [11] and [12] .
Geometric approaches to handling imprecision focus on the definition of an imprecise point, and model a point by a geometric region; hence, these approaches are called region-based models. In this area, -geometry as one of the earliest approach models each imprecise point by a disk whose radius is [13] . In fact, -geometry considers the maximum error, , for an imprecise point in all directions. However, interval geometry and tolerance-based approaches [14] and [15] take into account only the error in coordinates, resulting in an axis-aligned rectangular region. Other regions, such as segment, square and convex polygons, have also been considered to represent an imprecise point. A thorough study of region-based models is presented in [4] . Several problems have been studied under the region-based models. For example, finding the largest/smallest area axis aligned bounding box [16] , the largest/smallest area/perimeter convex hull [17] , and the Voronoi diagram and Delaunay triangulation [5] and [18] . Also, dependency as a new concept in uncertainty and error handling was introduced in [6] , and efficient algorithms have been presented for finding the closest and the furthest pairs, as well as solving range searching problems [19] . The problems of finding the diameter and the minimum enclosing circle for a set of imprecise points modeled by a set of finite candidates have been studied in [20] . In this type of modeling, each imprecise point has finite weighted elements as candidates, such that the weights govern the probability that the data point is at that particular location.
Although region-based models are easy and efficient approaches to handle imprecision, they are not yet strong enough to deal with circumstances occurring in real world problems. They assume that the probability of the presence of an imprecise point anywhere in its region is the same; in fact they assume a uniform distribution for all instances of an imprecise point, which is not true in most real applications. Moreover, usually, imprecision in the real world can be decreased by spending more. For example, a Geographical Positioning System (GPS) is a popular measuring approach for determining the location of a point on the earth using satellites. But, there is still considerable error in the GPS, due to satellite signal errors, receiver noise, orbital error, multipath error and atmospheric conditions [21] . A natural technique for increasing the precision of the GPS is to increase the number of satellites, which is called differential GPS. It gathers local information to obtain a more precise measurement. This idea can also be applied to observing and measuring devices, such as light data and ranging [22] and laser scanning systems [23] . Generally, by repeating an experiment, we can achieve more precision. Also, it is possible to decrease computational errors by providing more space for floating points and arithmetic calculations.
The issues mentioned above have motivated us to propose a dynamic model, called the -geometry model, for handling imprecision. We have introduced a parameter, , representing the level of imprecision. It changes continuously in the interval [0, 1] according to circumstances, such as changes in the precision of measurements or computations. This model is a generalization of static region-based models and allows regions to shrink (or grow), according to increase (or decrease) in the level of precision. So, the outputs of problems under the -geometry model are functions based on parameter . For values of near to 0, the solutions are very similar to exact data solutions, and for values of near to 1, algorithms are able to guarantee the correctness of the output, with respect to the worst case of imprecision in the input data. So, for small values of , only closer instances to the exact value of an imprecise point affects the output, and by increasing , the range of such efficient instances extends as well. Consequently, the -geometry model provides different levels of distribution for instances of an imprecise point. We have recently studied the problem of finding the largest axis-aligned bounding box of a set of n imprecise points under this model [24] . We proposed an time algorithm to solve this problem, where k is the maximum complexity of a region representing an imprecise point. To make the paper self-contained, we explain the model of -geometry and investigate some of its properties in the next section. In the third section, we study the problem of orthogonal range searching in one dimensional space under the -geometry model, and propose efficient algorithms to varieties of the problem. In the fourth section, we extend the algorithm to an application-based version of the problem in the plane, and, finally, we conclude our work in the last section.
As mentioned above, the region-based models are simple approaches for representing imprecision. When an imprecise point is modeled by a convex object, like a rectangle, it assumed that the point can present anywhere within the object. So, by changing the levels of precision, like increasing the precision of measuring tools, the size of the object changes, too. This means that the object shrinks by increasing the level of precision, however, it is possible that the amount of shrinking is not the same in all directions. Taking into account these concepts, we introduce the model of -geometry.
An imprecise d -dimensional point in the -geometry model is defined as , where is the exact value of the imprecise point p , is the imprecision matrix in which each vector , for , defines the maximum imprecision in its direction, and parameter shows the imprecision level for each . The parameter, , continuously changes in the interval [0, 1]. So, for any [0, 1], a region is considered for handling an imprecise point. This region, which includes all possible instances of a point, p , is the convex hull of points defined by the sum of the vectors, and . Figure 1 a illustrates an imprecise point, p , in the plane for different s, where and .
|
Figure 1. (a) Imprecise point p for different imprecision levels, , 1/3, 2/3 and 1. (b) Example of a rectangle in the model of -geometry for ; each vertex of the rectangle is an imprecise point with a square region. Note that in this model, it is possible that the angles of a rectangle are not exactly equal to , e.g. an instance of the rectangle for is shown by a blue dashed rectangle. |
For , the imprecise point, , is just the exact point, , while, for , it is the convex hull of the points induced by the vectors of M . Such a convex hull can be defined as follows:
|
where , for are the endpoints of the imprecision vectors of p . Note that, since we define p by the concept of a convex hull, it is possible for a vector to be ineffective. But, without loss of generality, we assume that all vectors are effective.
Now, we focus on the algebraic definition of an instance of an imprecise point, p ( , ):
|
In this definition, is an arbitrary vector that defines an instance, p ( ), to be anywhere in the convex hull mentioned earlier. So, an imprecise point, , is a region defined by:
|
By using the imprecision matrix, region , for an imprecise point, p, is a polygonal shape. So, a disk shaped region which is a popular and practical, representing a region in an imprecise context, is not supported. However, by using more imprecise vectors, we can approximate a disk shaped region, but this causes an increase in the complexity of the region. To efficiently model a disk shaped region in the -geometry model, we define an imprecise point, p, with a disk shaped region, by , where is the center of the disk and r is its radius, which corresponds to .
In many models of imprecision, a line is defined by two imprecise points, which is the union of all lines that pass through the corresponding regions of the points. Similarly, the line constructed by two imprecise points, and , is defined as follows:
|
where is the line passing through and . This definition can be generalized for defining imprecise segments and polygons as well. Figure 1 b shows a rectangle under the model of -geometry.
The -geometry model is a generalization of region-based models with the advantage that it can handle dynamic imprecision. As aforementioned, in the region-based models, there is no difference between instances of an imprecise point in terms of imprecision probability, and all instances are the same, which is against most real applications. In fact, these models assume a uniform distribution for all instances of an imprecise point. For example, when an imprecise point is modeled by a disk in the region-based models, its center and boundary points have the same probability for the presence of the point, while it is not true in general. Such dissimilarity is handled by defining the distribution function in probability theory, or by the membership function in fuzzy theory, and has been applied to imprecise data [25] and [26] . In the -geometry model, parameter plays this role by continuously shrinking the region of an imprecise point. Note that, since we are able to define an arbitrary size for the vectors in the imprecision matrix, M , upper bound 1 for is not important and can be replaced by any positive value.
The output of problems under the -geometry model is one or more functions based on , while it changes continuously in [0, 1]. Such functions can be useful in the decision making process. For instance, the model of -geometry helps to handle the zero problem in machines and in designing robust and stable geometric algorithms [27] . This problem asks if the value of an equation is exactly 0 or not. For example, when we need to determine whether point lies on line or not, we need to solve the equation “ ”. Because of finite precision in the machine, it is possible for a fatal error to occur in solving this equation. Using the -geometry model [13] , we solve the problem “Is ?”. Determining an appropriate value for is difficult and it depends on the machine precision. To smooth this problem, we suggest solving the problem under the -geometry model, which is “Is ?” where continuously changes in [0, 1]. The solution of this problem is functional and three cases may happen:
Final decision in two first cases is straightforward and in the last case decision can be dealt with regarding the magnitude of .
The dynamic view in the model of -geometry can be seen as a trade-off between costs and benefits associated with the imprecision level, . Thus, the model of -geometry is a useful tool for decision making in order to find an optimal level of precision. Indeed, the model of -geometry finds application in formulating the trade-offs that usually exist between cost and benefit in decision-making problems. For example, in facility location problems [28] where the objective function is to find the best place to locate some facilities, the geographical location of potential candidates, as an input of the problem, is estimated. It is not clear whether making this estimation more precise is economical, and if so, to what extent. The model of -geometry helps decision-makers to find the trade-offs which exist between the costs and the benefits associated with this estimation. The same discussion can be made for problems, such as the location-inventory problem [29] , lot-sizing with a supplier selection problem [30] and incorporating the geographical proximity of suppliers.
As an example, assume the problem of the p -center, which is one of the challenging facility location problems for locating emergency facilities, like police offices, and fire and emergency rescue stations [31] . In this problem, we are given n demand points and the goal is finding p facility locations, called p'centers , such that the maximum distance of the demand points from their closest center is minimized. Formally, if is a set of n demand points, and is a set of p centers, the goal is minimizing , where is the distance metric regarding the application, e.g. Euclidean distance, and Z (C ) is the fitness of the solution, C . From the geometric standpoint, this problem is covering n given points by p circles, such that the radius of the largest circle is minimized. Now, assume the problem of the p -center in an imprecise context, where the information about the position of the demand set is not exact or can be changed dynamically. By considering each imprecise point as a demand region, we can discuss the worst and best cases of the fitness of a solution. In fact, the distance between a center and a demand region can be defined as the maximum distance or the minimum distance between them. So, any solution C has two worst and best fitness values. By increasing precision and, consequently, shrinking the demand regions, the difference between the worst and best values of solution C is minimized as well. This difference can be interpreted as concept of risk in an emergency operation. So, any improvement in estimating the demand regions results in reaching a better amount of risk value. Figure 2 illustrates a hypostatical diagram of the worst and best fitness functions, based on parameter . When , the demand region of an imprecise point is exactly one single point, which is the exact position of the imprecise point, and increasing causes the demand region to grow. So, there is a trade-off between and the size of the demand points. In other scenarios of this problem, it is asked which precision level of results in finding a given risk value or a given fitness value.
|
Figure 2. A hypostatical diagram of the worst and best fitness functions in p -center problem. |
Finally, there is a similarity between the -geometry model and mobile and kinetic data [32] and [33] . The physical parameter time in mobile and kinetic problems is similar to parameter in the model of -geometry. The main difference between these concepts is that in the -geometry model, we have exactly one instance for a fixed value of , which can be selected anywhere in the imprecise region, , while, in mobile and kinetic data, we know the exact position of p at any time.
Let be a set of n points on the real line, and be an interval as a query range. The range searching problem is reporting all points over P that lie inside R . This problem can be solved using a balanced binary search tree in query time, where k is the number of reported points. To this end, we need space and time to build the tree in a preprocessing step [34] . Considering imprecise points and ranges, there are three variations of the problem under the -geometry model: precise points and imprecise ranges, imprecise points and precise ranges, and imprecise points and imprecise ranges. Also, since instances of each imprecise point or range can be selected anywhere in its region, we focus on the maximum and minimum number of reported points as the worst and best cases of output in the form of two functions, with respect to parameter .
We denote the i -th imprecise point by , where , such that and . Thus, the leftmost and rightmost instances of are and for the imprecision level, . Similarly, we denote the imprecise range, , where and . Figure 3 shows an example of the general version of the problem in one dimension (for simplicity, we depict imprecise points at different heights). Hollow points show the exact positions and bold ones show the leftmost and rightmost instances of points for . In this figure, when increases from 0 to 1, several types of event may occur. Point lies outside the range for , and by increasing , it is possible that its right instances lie inside the range. Also, its left instances lie outside for all s. Similar cases occur for points , and , however, point lies inside and lies outside the range for all s.
|
Figure 3. An example of the range searching problem in one dimensional space. All points have same height and only for convenience, they are depicted at different heights. |
Therefore, we are interested not only in reporting all such points, but also in computing the corresponding event points for reporting the minimum and maximum output functions, based on . A critical value for is a value at which an instance of an imprecise point enters the range or exits from it. Based on such critical values of , we can report the minimum and maximum possible query points while changes. In what follows, we propose algorithms for the three mentioned cases of the problem.
In this case, the positions of points in P remain fixed when changes, and only the range is imprecise. So, we can map the problem onto the functional space, with respect to the value of in [0, 1]. To this end, we define for . Also, we define four linear range functions: , , and . To rule out degeneracy, we require . Figure 4 shows these functions and the points for the example illustrated in Figure 3 .
|
Figure 4. Functional space of the example illustrated in Figure 3 for precise points and imprecise range. |
To solve the problem, first, as a preprocessing step, we build a binary search tree over set P in time and space. Then, in the query phase, we find all range query points for two ranges, and , on set P , and store the reported points in and , respectively. Let be the size of and be the size of . To construct the minimum number of range point functions, we compute all s corresponding to the intersection of points with and . Note that we need constant time to find the intersection for each point. Similarly, we compute the corresponding intersection s for points with and to construct the maximum number of range point functions. Since and are sorted, critical s can be extracted in an orderly manner. Therefore, the minimum and maximum output functions can be constructed in time.
In this case, we have range and imprecise point set . We build two binary search trees over set and in the preprocessing step, in time and space. contains all n exact points and contains all leftmost and rightmost instances corresponding with . In the query phase, we find the query results for over and . Figure 5 shows the functional space of the example illustrated in Figure 3 for the precise range. Using the results and finding the corresponding intersection s, we can report all k critical s in time, but to construct the minimum and maximum output functions, we need to sort the critical s. Therefore, we use time and space in the preprocessing step, and the output functions can be constructed in query time.
|
Figure 5. Functional space of the example illustrated in Figure 3 for imprecise points and precise range. |
Considering the real applications of range searching problems, usually, k is much smaller than n , however, we can improve the query time to by using more space and time in the preprocessing step. To this end, we compute all intersection points in the arrangement of segments in the functional space. This takes time using a sweep line algorithm, where m is the number of intersections [33] . Finally, in the preprocessing step, we compute all corresponding s between any two sequential intersections and store them in order in a list. In the query phase for a given range, , first, by using a binary search, we find the position of the horizontal lines, and , in the sorted list. Then, we find sorted s in linear time. Therefore, the query time is improved to by using preprocessing space and preprocessing time.
This case is solved by a similar approach to the one described above. We perform two range searching queries with ranges over set and over set . Similar to the case of imprecise points and precise range, we can report the query points in time, and construct the minimum and maximum output functions in time using time and space in the preprocessing step. Unfortunately, the idea explained for improving the time for constructing output functions does not work in this case, because the query lines are not horizontal.
In this section, we study the problem of imprecise range searching in the plane. In this case, two aspects of each object are stored in a database. For example, assume we store the salary and working hours of an employee per month in a database. So, in a two-dimensional range searching problem, we are interested to find all employees whose number of working hours lies between and hours, and earn between and dollars a month. This problem can be modeled geometrically as follows: let be a set of n points in the plane and be a query range. The goal is to find all points which lie in the range. In the presence of imprecision, both database information and range can be imprecise. For example, in the employees’ data, when we record the salary and number of working hours for one or more years, and the query is only for one month, we can use the average and variance of them in the model of -geometry for defining the exact point and imprecision matrix, respectively. Indeed, the salary and number of working hours are stored for several months in the database separately, while the query does not indicate a particular month. With regard to the concepts reviewed in the previous section, we only explain how to solve the last case of the problem, in which both range and points are imprecise. The other two, precise points and imprecise range, and imprecise points and precise range, can be solved by using similar approaches.
An imprecise orthogonal range in the plane similar to one dimensional space, is defined by , where , , , denote the left, right, bottom and top bounds of the range, respectively. Since imprecision of a point can be appeared both horizontally and vertically (e.g. in the example of the employee database, the x -axis shows the salary and the y -axis shows the number of working hours), the imprecise region of each point can be modeled by an axis-aligned rectangle, which shows the maximum tolerance of the point in x and y axes. Figure 6 a shows an imprecise point with an axis-aligned rectangular region. Similarly, an imprecise range is an imprecise orthogonal axis-aligned region. However, an extension of this problem is to allow the regions and/or the ranges to be arbitrary convex polygons. Figure 6 b shows an example of the range searching problem for a hypostatical range in minimum and maximum cases. As described before, these cases can be defined by linear functions, with respect to .
|
Figure 6. (a) Axis-aligned rectangular imprecise point and its candidate points. (b) Example of range searching problem for 10 imprecise points and an imprecise orthogonal range. |
Regarding the imprecision of points and range, we can partition P into three subsets:
For the example illustrated in Figure 6 b, , and the other points belong to . Therefore, for a given imprecise range, , the goal is to report subsets and , and also to construct minimum and maximum output functions. To this end, we define corner points of a rectangular region, in , as four candidate points of p (see Figure 6 a). As aforementioned, these candidate points can be defined by linear functions, with respect to . Let and be the minimum and maximum possible ranges of R ( ) in . Since both the range and points are axis-aligned rectangles, if imprecise point p belongs to , then two cases may arise:
By using this observation, we can define set as such imprecise points, all four candidate points of which lie inside . Similarly, contains imprecise points, all four candidate points of which lie outside . Therefore, we perform the following procedure to partition P into subsets, , and :
Procedure of orthogonal range searching in the plane
Step 1: Solve the range searching problem for range over all candidate points. Let be the reported points.
Step 2: Solve the range searching problem for range over all candidate points. Let be the reported points.
Step 3: Let be all imprecise points, all of whose candidate points lie in .
Step 4: Let be all imprecise points some of whose candidate points lie in .
Step 5: .
After finding and , we can determine the type of points which lie in the range. More precisely, for a point p in , we can find its critical s by computing the intersection of with the boundary of , while changes. This can be done in O(1) because the complexity of each imprecise region is constant. In the illustrated range searching problem in Figure 5 b, , and are points, all of whose instances lie inside the maximum possible range, while other points of always have some instances outside the range, in the worst case.
Let and be the size of and . The explained procedure runs in query time by using a fractional cascading data structure [35] . This data structure needs preprocessing time and space. Because of the trade-off between query time and preprocessing storage, the query can be answered in time using only space and time in the preprocessing step using a range tree structure [34] . These complexities are only for reporting query points, and for constructing minimum and maximum output functions, we need to sort all critical s, as described in the previous section. Thus, it takes time.
We introduce a dynamic model; the -geometry model based on the level of imprecision to handle error in the input data of geometric problems. The model of -geometry, which is a generalization of region based models, provides the output of problems as functions based on the level of precision, which helps the decision maker in the trade-off between cost and benefit. Also, these functions are useful for designing robust and stable geometric algorithms under imprecise data. Since there are many instances of an imprecise point which can be chosen, the definitions of problems under imprecision are not unique. From an application standpoint, usually the worst and best cases of output can be studied in such problems. In this paper, we studied the range searching problem in one and two dimensional space under the -geometry model and proposed efficient algorithms to solve different versions of the problems. Studying the range searching problem in a general case, where imprecise points have an arbitrary polygonal region, remains open, as well as studying the problem in higher dimensions.
We thank the unknown reviewer whose comments improved the paper considerably.
Published on 01/01/2017
Licence: Other
Are you one of the authors of this document?