Beruflich Dokumente
Kultur Dokumente
. If a eld
edge s i exists in p which is missing from p
is used for merging. The i node has been assigned in p, but may or may
not have been assigned in p
, so the value in p
, s s index
nodes are copied directly from p, as there is no UNKNOWN node of s in p
to merge
them with. These merges are common, even for variables and elds, as names are
often initialized in loops, and different storage nodes are built in different paths due
to context-sensitivity.
Comparison to PHP Memory Model
The difference between the PHP run-time memory model and our points-to graphs is
demonstrated in Figure 6.6, corresponding to Listings 6.5 and 6.6. The PHP run-time
memory model for these listings is shown in Figure 6.5. For a simple assignment-by-
copy, the compile-time and run-time models are largely the same.
However, the difference is more visible in an assignment-by-reference. We designed
our analysis to model both may-aliases and must-aliases, so we are not able to simply
use multiple value edges to the same value to model references, as in the PHP run-time
memory model. Instead, we are able to model this by using reference edges between
aliased index nodes. If we modelled the run-time behaviour by simply sharing value
10
As an extension to this scheme, if we could constrain the eld name using string-range analysis [Wasser-
mann and Su, 2007], the analysis would be more precise. String-range analysis allows modelling
possible values of strings. If we knew the range of strings being used to access a eld, we could
discount some possible values, leading to a more precise result.
130 Chapter 6. Static Analysis of Dynamic Scripting Languages
(a) Assignment by copy (see Listing 6.5) (b) Assignment by reference (see Listing 6.6)
Figure 6.5: PHP assignment memory representation
(a) Points-to graph for an assignment by-copy from
Listing 6.5
(b) Points-to graph for a reference assignment from
Listing 6.6. The reference edge marked D is a de-
nite reference edge between $x and $y.
Figure 6.6: Points-to graphs, corresponding to the PHP run-time memory models in Figure 6.5
and storage nodes, we would be unable to model must-reference behaviour, resulting
in a signicant loss of precision. We also use one value node per index node, rather
than sharing value nodes between references. If we shared value nodes, then each
may-denition (an assignment via a may-reference) would need to be a weak update.
Instead, this allows a strong update to be performed on the index node being dened,
and only its may-references are weak-updated.
For a larger example, Figure 6.7b shows the points-to graph for Listing 6.10. The
corresponding run-time memory layout is shown in Figure 6.7a. At run-time, the
symtable has three local variables, $w, $x and $y. The variable $w points to an array.
The variables $x and $y alias each other and also alias the 0th element of $ws array.
The arrays 1th value is also shown. We can see again that references between index
nodes are shown not by sharing the same value nodes, but rather by the edges between
6.5. Analysis 131
1 $x = 5;
2 $y =& $x;
3 $w = array ();
4 $w[0] =& $x;
5 $w[1] = "str";
Listing 6.10: A short program with reference assignments and an array
(a) Run-time memory representation (b) Points-to graph. For simplicity, index nodes are
shown within their storage node, instead of using
eld edges. Edges marked D are denite reference
edges.
Figure 6.7: Memory layouts for Listing 6.10
the aliased index nodes.
Access Paths
As the abstraction used for all tables in the program is the same, we are able to map all
lvalues
11
and rvalues
11
to a traversal of the points-to graph. This mapping is referred
to as an access path [Larus and Hilnger, 1988].
An access path consists of a storage part and an index part, representing the table
being indexed, and its key. Consider a simple assignment $l = $a[$i]. The lvalue is
modelled as:
$l = ST l ;
11
As described in Section 3.1, we use lvalues to refer to names which may be written to, and rvalues to
refer to names which may only be read. However, because of PHPs ability to alter values that are
being read, this isnt a strict denition.
132 Chapter 6. Static Analysis of Dynamic Scripting Languages
that is, $l maps to the local symbol-table indexed by the string l. Both the storage
and index parts can themselves be access paths, required for array dereferences:
$a[$i ] = (ST a) (ST i )
In this case, both the array and its index must rst be fetched from the symbol-table.
We would expect in general that $a will nd a storage node corresponding to an array,
and the $i will nd a value node representing an integer or a string. Variable elds are
modelled the same way.
Variable-variables are modelled as
$$x = ST (ST x)
while simpler array assignments are represented as
$a[0] = (ST a) 0
An access path evaluates to a list of index nodes. If the access path represents an lvalue,
the value(s) of the index nodes are being set. If the path represents an rvalue, the
value(s) of the index nodes will be copied or referenced. The algorithm for converting
an access path to set of index nodes is straightforward:
Each access path where both the storage part s and the index part i are named
is evaluated. The access path will resolve to a storage node named s with a eld
named i .
If the resolved access path p