This post is inspired by an entry on nflath.com about the dangers of String.substring() method.
Probably most of the Java users is aware that String object is more complex than just an array of char. To make the usage of strings in Java more robust additional measures were taken – for instance the String pool was created to save memory by reusing the same String objects instead of allocating new ones. Another optimization that I want to talk about today is adding the offset and count fields to the String object instances.
Why those fields were added? Their purpose is to help to reuse already allocated structures when using some of the string functionalities – like calculating substrings of a given string. The concept is that instead of creating an new char array for a substring we could just ‘reuse’ the old one. To be exact this is what String.substring() method does: instead of copying an char array for the returned object it creates a new String reusing char[] of the old one. Only the values of offset and count fields (which indicate the beginning and the length of a new string) are changed. Because the substring operation is quite often used this mechanism helps to save a lot of memory. It is important to add that this can work only because String objects are immutable. See the following snippet:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
public static void sendEmail(String emailUrl) {
String email = emailUrl.substring(7); // 'mailto:' prefix has 7 letters
String userName = email.substring(0, email.indexOf("@"));
String domainName = email.substring(email.indexOf("@"));
}
public static void main(String[] args) {
sendEmail("mailto:user_name@domain_name.com");
}
|
Thanks to the way substring() is implemented when we extract the email, userName and domainName three new String objects are created, but the char array is not copied. All new string variables reuse the character array from emailUrl. Thanks to that reuse for really long urls we can save approximately 2/3 of memory we would use otherwise. Great, right?
Ok… now the dark side of that optimization! Check out this snippet:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
public final static String START_TAG = "<title>";
public final static String END_TAG = "</title>";
public static String getPageTitle(String pageUrl) {
// retrieve the HTML with a helper function:
String htmlPage = getPageContent(pageUrl);
// parse the page content to get the title
int start = htmlPage.indexOf(START_TAG);
start = start + START_TAG.length();
int end = htmlPage.indexOf(END_TAG);
String title = htmlPage.substring(start, end);
return title;
}
|
In here we are extracting from the HTML page its title – can you see the problem with this code? Looks simple and correct, right?
Now, try to imagine that the htmlPage String is huge – more than 100.000 characters, but the title of this page has only 50 characters. Because of the optimization mentioned above the returned object will reuse the char array of the htmlPage instead of creating a new one… and this means that instead of returning a small string object you get back a huge String with 100.000 characters array!! If your code will invoke getPageTitle() method many times you may find out that you have stored only a thousand titles and already you are out of memory!! Scary, right?
Of course there is an easy solution for that – instead of returning the title in line 13, you can return new String(title). The String(String) constructor is always doing a subcopy of the underlying char array, so the created title will actually have only 50 characters. Now we are safe:)
So what is the lesson here? Always use new String(String)? No… In general the String optimizations are really helpful and it is worth to take advantage of them. You just have to be careful with what you are doing and be aware of what is going on ‘under the hood’ of your code. String class API is in some situations not intuitive, so beware! (or just read trough it at least once:D)
13 Comments until now
I don’t quite understand. As you mentioned, the underlying reference to the char array is shared between htmlPage and title, isn’t it? This implies that no other 100k char array is created when doing htmlPage.substring(start, end), right? I’m afraid I’m missing something out here
Could you explain further? Thanks!
Yes, this char[] is shared between htmlPage and title Strings. But you’re interested in storing titles, not the pages so probably after executing the function you’ll want to get rid of htmlPage. This is where it will hit you – even though you’re storing only titles the content of the page cannot be garbage collected as it is shared with the title String.
Since this probably was not clear, I’ll change the code in this post to show I am not interested in storing the htmlPage
Interesting and useful article. Good to keep in mind!
Note: looks like the markup inside START_TAG definition ate the END_TAG and the tags themselves (ie. they were not escaped by your blog), you might want to fix that.
Java does so many things wrong, including java.lang.String. The issue here is the forced evaluation of the data structure. The forced evaluation is required since Java is a forcefully imperative language with no flexibility.
Functional Java has a lazy list (fj.data.Stream) data structure that will solve your problem. Take a look at the drop/take methods some time.
Better still, upgrade the language to a more practical one.
I’d probably use StringBuffer if I knew a String was going to be more than 1000 characters long. In fact, that’s probably the way to go for substring anyways if you don’t care about the original string being cached.
pls help me understand if String object is garbage collected?
As far as I know String is stored in a pool .therefore once created it cannot be garbage collected?
[...] link is being shared on Twitter right now. @shauryashaurya said I love the tag line here: [...]
A very good article.
Awesome ! Really appreicate your job !
>pls help me understand if String object is >garbage collected?
>As far as I know String is stored in a pool >.therefore once created it cannot be garbage >collected?
if u define a variable like:
String z = “aaa”; // this value will be taken from String’s pool.
String z = new String(”aaa”); // here a new string object with value “aaa” will be created regardless of what u have in ur pool.
ps: afaik some of values from String’s global pool, will be GC-ed at some point.. but this is not smth u should care about.
If your code will invoke getPageTitle() method many times you may find out that you have stored only a thousand titles and already you are out of memory!! Scary, right?
not really. as u mentioned u have only one 100k char array, so if u run out of memory that’s not bckz u invoked 1000 times getPageTitle().
once u have 1 String pointing at the title u already have a live reference, so there should not be a big difference 1 or 1000 references.
By saying ” your code will invoke getPageTitle() method many times” I meant invocation with different URLs to get different pages. Since URLs are different the page content is also different and there is no sharing of strings between requests.
Even if you would execute this method with the same parameter 1000 times, since the content is downloaded from the web you probably would get different string objects that would have the same content, but did not share an char array (unless getPageContent(pageUrl) implements some caching mechanism or interns the results).
Women’s vibram five fingers kso shoes Gray Pink Shoes can keep your feet in the freest condition,just as if your were barefooted. vibram five fingers kso shoes is made of the best and proper materials,so you don’t need to doubt of its comfort with so thin sole and upper.
Gray match with a dreamlike and romantic color,pink vibram five fingers kso shoes is very consistent with the style of young girls.In this busy and tense times,people need relax without any fetter.
Add your Comment!