A Story of a PHP Variable Reference Pitfall

I am one of the lucky people who stumbled upon PHP when it was still a little-known, home-grown, scripting language, many, many years ago. I immediately fell in love with it. There was something pure and wild about it that just did not exist in many other languages; kindof like: driving a manual vs an automatic car. The love was permanent and I’ve always tried to keep up with PHP even through my dark days of being .NET, TCL, Java etc. programmer at my day-jobs.

Irakli Nadareishvili
#Development | Posted

I am one of the lucky people who stumbled upon PHP when it was still a little-known, home-grown, scripting language, many, many years ago. I immediately fell in love with it. There was something pure and wild about it that just did not exist in many other languages; kindof like: driving a manual vs an automatic car. The love was permanent and I’ve always tried to keep up with PHP even through my dark days of being .NET, TCL, Java etc. programmer at my day-jobs. Then a few years ago, I got lucky to make Drupal my day-job and finally I was back to enjoying PHP in full.

Having had an extensive PHP background, I thought that I had it pretty much nailed and little could surprise me. Certainly not something as “simple” as variable references, I thought. I was wrong.

As we all know, non-object variables in PHP5 are assigned by value. So if $a and $b are arrays and you do $b = $a, $b gets a copy of $a. Well, apparently not entirely true. What I discovered the hard way is that even when $b = $a puts a copy of $a in variable $b, if $a is an array that holds an object, somewhere down its structure, $b gets a reference to that object, not a copy of a value! Meaning, $b itself is a copy of $a, but one of its referenced variables, somewhere down the structure is actually a reference and will get affected if $a subsequently changes it.

To clarify the matter, consider this test code:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  1. <span class="re0">$a</span> = <span class="kw3">array</span><span class="br0">(</span><span class="br0">)</span>;
  2. <span class="re0">$a</span> <span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> = <span class="st0">'initial string'</span>;
  3. <span class="re0">$obj</span> = <span class="kw2">new</span> stdClass<span class="br0">(</span><span class="br0">)</span>;
  4. <span class="re0">$obj</span>-<span class="sy0">></span><span class="me1">title</span> = <span class="st0">'initial title'</span>;
  5. <span class="re0">$a</span> <span class="br0">[</span> <span class="nu0">1</span> <span class="br0">]</span> = <span class="re0">$obj</span>;
  6.  
  7. <span class="re0">$b</span> = <span class="re0">$a</span>;
  8. <span class="re0">$b</span> <span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> = <span class="st0">'modified string'</span>;
  9. <span class="re0">$b</span> <span class="br0">[</span> <span class="nu0">1</span> <span class="br0">]</span>-<span class="sy0">></span><span class="me1">title</span> = <span class="st0">'modified string'</span>;
  10.  
  11. <span class="kw3">print_r</span><span class="br0">(</span><span class="re0">$a</span><span class="br0">)</span>;
  12. <span class="kw3">exit</span><span class="br0">(</span><span class="br0">)</span>;

You would expect (at least, I did) that the result would be:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  1. <span class="kw3">Array</span>
  2. <span class="br0">(</span>
  3.     <span class="br0">[</span><span class="nu0">0</span><span class="br0">]</span> =<span class="sy0">></span> initial string
  4.     <span class="br0">[</span><span class="nu0">1</span><span class="br0">]</span> =<span class="sy0">></span> stdClass Object
  5.         <span class="br0">(</span>
  6.             <span class="br0">[</span>title<span class="br0">]</span> =<span class="sy0">></span> initial string
  7.         <span class="br0">)</span>
  8.  
  9. <span class="br0">)</span>

since $b holds a copy of $a and can not possibly modify values in $a. True for any vairables, except the object value. The actual result you will get is:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  1. <span class="kw3">Array</span>
  2. <span class="br0">(</span>
  3.     <span class="br0">[</span><span class="nu0">0</span><span class="br0">]</span> =<span class="sy0">></span> initial string
  4.     <span class="br0">[</span><span class="nu0">1</span><span class="br0">]</span> =<span class="sy0">></span> stdClass Object
  5.         <span class="br0">(</span>
  6.             <span class="br0">[</span>title<span class="br0">]</span> =<span class="sy0">></span> modified string
  7.         <span class="br0">)</span>
  8.  
  9. <span class="br0">)</span>

Notice how string value did not change, but the object one – did.

Unfortunately, all these was not quite as clear in my case as it is in the test code. After banging my head at this mystery for a while, I found the explanation in Mike Lively’s blog post:

“Reason for this is that any given variable holding an object in php does not technically hold an object. It holds a handle to that object “

To make things even worse, apparently the above rule (in PHP 5.2, to be more precise) transcends many boundaries that you would expect it to not transcend. One example: if a function returns value of a static variable from function’s scope, that holds an object, the variable itself is returned by value, but any object that the variable may hold – by reference! All kinds of nasty things can come out of this little-known pitfall. It is especially dangerous for a framework like Drupal, where complex variable structures are commonly held in static variables, in functions – to improve performance.

Lesson learned: PHP is, indeed, wild, but not always in a good way :)

Irakli Nadareishvili