This seems basic, right? In most cases it is, but as almost everything in Java this problem has its subtle pitfalls and problems. It is mainly because Java does not provide a simple utility method that can answer this question. Today I wanted to share with you several ways of solving this problem and describe their good and bad sides.
Why should you care?
Checking for that in many cases is unnecessary. If the format of data is defined and its contract states that the string is an integer you can just parse it and deal with unlikely exception that an error occurs. The problem is when there is no such a contract and you have to decide based on whether the string is an integer what actions to perform next. In that case plain try-catch check may be too expensive for you:
1: 2: 3: 4: 5: 6: 7: 8: |
public boolean isInteger(String string) {
try {
Integer.valueOf(string);
return true;
} catch (NumberFormatException e) {
return false;
}
}
|
This method’s execution cost is high because of two factors: one is that to determine if string is an integer we have to do the whole parsing and throw away the result. Second is that we use exception throwing (which is expensive) to direct the program flow. The good thing about this code is its simplicity – you can at a glance say the method is correct.
Let’s use RegExp!
Much faster is to create a regular expression and use it to check whether string contains an integer or double. The good thing about this approach is that the regexp can be precompiled and used several times after:
1: 2: 3: 4: 5: |
private static Pattern doublePattern = Pattern.compile("-?\\d+(\\.\\d*)?");
public boolean isDouble(String string) {
return doublePattern.matcher(string).matches();
}
|
Unfortunately this method has important flaws: the pattern above will work for the most basic string representation of Double, but what about more advanced like “1.23E-12″. Even if you improve this pattern (belive me, its difficult) there are still some checks that it will not be able to perform, for instance checking if the integer is above Integer.MAX_INT.
What about Scanner?
There is a way of combining the two approaches shown above together: first check with regexp if string is possibly be an integer and if it seems to be one, try to perform the actual parsing. If the regexp is ‘good enough’ the number of false positives resulting in NumberFormatException will be acceptable. The good news is this approach is already implemented by a Scanner class. See the following example:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
public static void main(String[] args) {
Scanner scanner = new Scanner("Test string: 12.3 dog 12345 cat 1.2E-3");
while (scanner.hasNext()) {
if (scanner.hasNextDouble()) {
Double doubleValue = scanner.nextDouble();
} else {
String stringValue = scanner.next();
}
}
}
|
In essence Scanner breaks down the given string into tokens around whitespace and allows you to iterate trough them. It gives you useful access methods like ‘hasNextDouble()’ to check whether the next token is a Double or not and allows you to get it in a parsed version as a Double with ‘nextDouble()’ method.
Internals of Scanner show that it in fact combines both the regexp and exception catching methods, which makes it quite efficient. The downside is that the Scanner object itself is heavy and prepared to parsing larger text strings, so it may be ineffective if you need to use it on a simple strings like “123″.
Wait! It does not work for me!!
It is possible that you start using one of the methods above on a real life data and at some point things stop making sense… Why? Because we forgot about something important: the numbers are locale-sensitive and its string representation depends from country to country. For instance ten thousand in US is 10,000, in Poland 10 000 and in Italy 10.000. See that none of the methods above could successfully parse neither Polish or Italian numbers! What can you do in those cases? You have to use for parsing a NumberFormat class with specified locale:
1: 2: 3: 4: 5: 6: |
private static NumberFormat italianDouble =
NumberFormat.getNumberInstance(Locale.ITALIAN);
public boolean isItalianDouble(String string) {
return (italianDouble.parse(string) != null);
}
|
Now you can finally see that 10,000 is a valid integer. Unfortunately with NumberFormat you get another set of problems – it is too liberal in parsing numbers! The method above will return true for 10,000 and false for both abc and x1, but it will return true also for 10abc as it looks only for a suffix in the string, not a total match.
Conclusion
As you can see none of the solutions shown above is perfect – each of the method aboves has its flaws and advantages. Because of that the choice which one is the best for you strongly depends on the context of your program. The important factors are: how often do you need to do a check like that, what is the false result ratio, whether you parse long human readable text or just few given values and whether you care about locale specific issues. It is also possible that in your code you’ll need a combination of them or to add some specific tweaks to one of them.
11 Comments until now
Thank you for this post which is very interesting.
I was aware of most of this methods except the Scanner one…
Apache commons lang library could also be checked with the method isNumber from NumberUtils class (package org.apache.commons.lang.math).
Based on a commonly internationalized version of a website. We often have to validate that a typed in data is really an integer (or double for an amount for example). This looks like such a basic requirement! And you simply highlight that it is not so simple to do that…
To achieve a total match with NumberFormat you have to evaluate your own ParsePosition object.
String input = “2,33abc”;
ParsePosition pp = new ParsePosition(0);
Number parsedNumber = NUMBERFORMAT.parse(input, pp);
if (input.length() == pp.getIndex()
&& parsedNumber != null {
// complete number was parsed
}
The real issue IMO is that NumberFormatException is useless. 99% of the time you really will just treat a non parsable int as 0.
A try/catch is really not that expensive though.
[...] This post was mentioned on Twitter by bubbl. bubbl said: How to check if String is parseable to Integer or Double? http://ff.im/-blm4r [...]
The javadoc comment for Double.valueOf(String):
http://java.sun.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String)
includes a regular expression that recognizes valid inputs.
Well, data validation in general is quite hard and often done wrong… I think the best idea is to use some sort of validation framework…
Why not use: org.apache.commons.lang.math.NumberUtils.isDigit(String number) or isNumber? I like apache.commons and I’m use them extensively.
Checks whether the String a valid Java number.
Valid numbers include hexadecimal marked with the 0x qualifier, scientific notation and numbers marked with a type qualifier (e.g. 123L).
Null and empty String will return false.
You can use DecimalFormatSymbols to improve regexp to better match locale settings.
DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault());
dfs.getDecimalSeparator();
dfs.getExponentSeparator();
dfs.getGroupingSeparator();
public static boolean isInt(double value) {
return Math.floor(value) == value;
}
If you are validating a String that might not be a number, write a for() loop and use Character.isDigit(char) before converting the String to a double. Then use the method above. Be sure to add a condition for numbers in scientific notation.
Also – if you already know that your data is a valid double but you want to store something like 1.00000 as an Integer you could simply use:
double do = Double.parseDouble(myData);
if(do % 1 == 0)
{
int i = (int) do;
}
Add your Comment!