String memory internals
String
class stores the texts, how interning and constant pool works.The main point to understand here is the distinction between
String
Java object and its contents - char[]
under private value
field. String
is basically a wrapper around char[]
array, encapsulating it and making it impossible to modify so the String
can remain immutable. Also the String
class remembers which parts of this array is actually used (see below). This all means that you can have two different String
objects (quite lightweight) pointing to the same char[]
.I will show you few examples, together with
hashCode()
of each String
and hashCode()
of internal char[] value
field (I will call it text to distinguish it from string). Finally I'll show javap -c -verbose
output, together with constant pool for my test class. Please do not confuse class constant pool with string literal pool. They are not quite the same. See also Understanding javap's output for the Constant Pool.Prerequisites
For the purpose of testing I created such a utility method that breaksString
encapsulation:private int showInternalCharArrayHashCode(String s) {It will print
final Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
return value.get(s).hashCode();
}
hashCode()
of char[] value
, effectively helping us understand whether this particular String
points to the same char[]
text or not.Two string literals in a class
Let's start from the simplest example.Java code
String one = "abc";BTW if you simply write
String two = "abc";
"ab" + "c"
, Java compiler will perform concatenation at compile time and the generated code will be exactly the same. This only works if all strings are known at compile time.Class constant pool
Each class has its own constant pool - a list of constant values that can be reused if they occur several times in the source code. It includes common strings, numbers, method names, etc.Here are the contents of the constant pool in our example above:
const #2 = String #38; // abcThe important thing to note is the distinction between
//...
const #38 = Asciz abc;
String
constant object (#2
) and Unicode encoded text "abc"
(#38
) that the string points to.Byte code
Here is generated byte code. Note that bothone
and two
references are assigned with the same #2
constant pointing to "abc"
string:ldc #2; //String abc
astore_1 //one
ldc #2; //String abc
astore_2 //two
Output
For each example I am printing the following values:System.out.println("one.value: " + showInternalCharArrayHashCode(one));No surprise that both pairs are equal:
System.out.println("two.value: " + showInternalCharArrayHashCode(two));
System.out.println("one" + System.identityHashCode(one));
System.out.println("two" + System.identityHashCode(two));
one.value: 23583040Which means that not only both objects point to the same
two.value: 23583040
one: 8918249
two: 8918249
char[]
(the same text underneath) so equals()
test will pass. But even more, one
and two
are the exact same references! So one == two
is true as well. Obviously if one
and two
point to the same object then one.value
and two.value
must be equal.Literal and new String()
Java code
Now the example we all waited for - one string literal and one newString
using the same literal. How will this work?String one = "abc";The fact that
String two = new String("abc");
"abc"
constant is used two times in the source code should give you some hint...Class constant pool
Same as above.Byte code
ldc #2; //String abcLook carefully! The first object is created the same way as above, no surprise. It just takes a constant reference to already created
astore_1 //one
new #3; //class java/lang/String
dup
ldc #2; //String abc
invokespecial #4; //Method java/lang/String."<init>":(Ljava/lang/String;)V
astore_2 //two
String
(#2
) from the constant pool. However the second object is created via normal constructor call. But! The first String
is passed as an argument. This can be decompiled to:String two = new String(one);
Output
The output is a bit surprising. The second pair, representing references toString
object is understandable - we created two String
objects - one was created for us in the constant pool and the second one was created manually for two
. But why, on earth the first pair suggests that both String
objects point to the same char[] value
array?!one.value: 41771It becomes clear when you look at how
two.value: 41771
one: 8388097
two: 16585653
String(String)
constructor works (greatly simplified here):public String(String original) {See? When you are creating new
this.offset = original.offset;
this.count = original.count;
this.value = original.value;
}
String
object based on existing one, it reuses char[] value
. String
s are immutable, there is no need to copy data structure that is known to be never modified. Moreover, since new String(someString)
creates an exact copy of existing string and strings are immutable, there is clearly no reason for the two to exist at the same time.I think this is the clue of some misunderstandings: even if you have two
String
objects, they might still point to the same contents. And as you can see the String
object itself is quite small. Runtime modification and intern()
Java code
Let's say you initially used two different strings but after some modifications they are all the same:String one = "abc";The Java compiler (at least mine) is not clever enough to perform such operation at compile time, have a look:
String two = "?abc".substring(1); //also two = "abc"
Class constant pool
Suddenly we ended up with two constant strings pointing to two different constant texts:const #2 = String #44; // abc
const #3 = String #45; // ?abc
const #44 = Asciz abc;
const #45 = Asciz ?abc;
Byte code
ldc #2; //String abcThe fist string is constructed as usual. The second is created by first loading the constant
astore_1 //one
ldc #3; //String ?abc
iconst_1
invokevirtual #4; //Method String.substring:(I)Ljava/lang/String;
astore_2 //two
"?abc"
string and then calling substring(1)
on it. Output
No surprise here - we have two different strings, pointing to two differentchar[]
texts in memory:one.value: 27379847Well, the texts aren't really different,
two.value: 7615385
one: 8388097
two: 16585653
equals()
method will still yield true
. We have two unnecessary copies of the same text.Now we should run two exercises. First, try running:
two = two.intern();before printing hash codes. Not only both
one
and two
point to the same text, but they are the same reference!one.value: 11108810This means both
two.value: 11108810
one: 15184449
two: 15184449
one.equals(two)
and one == two
tests will pass. Also we saved some memory because "abc"
text appears only once in memory (the second copy will be garbage collected).The second exercise is slightly different, check out this:
String one = "abc";Obviously
String two = "abc".substring(1);
one
and two
are two different objects, pointing to two different texts. But how come the output suggests that they both point to the same char[]
array?!?I'll leave the answer to you. It'll teach you how
one.value: 23583040
two.value: 23583040
one: 11108810
two: 8918249
substring()
works, what are the advantages of such approach and when it can lead to big troubles.Lessons learnt
String
object itself is rather cheap. It's the text it points to that consumes most of the memoryString
is just a thin wrapper aroundchar[]
to preserve immutabilitynew String("abc")
isn't really that expensive as the internal text representation is reused. But still avoid such construct.- When
String
is concatenated from constant values known at compile time, concatenation is done by the compiler, not by the JVM substring()
is tricky, but most importantly, it is very cheap, both in terms of used memory and run time (constant in both cases)