2.17.2: 2011-06-17

net.sf.basedb.clients.web.util
Class HTML

java.lang.Object
  extended by net.sf.basedb.clients.web.util.HTML

public class HTML
extends Object

This class contains a set of static methods that may be useful in a web application for handling text/HTML strings.

Version:
2.0
Author:
Nicklas

Field Summary
private static Pattern AMP
           
private static Pattern AMPERSAND
           
private static Pattern AT_OR_DOT
           
private static Pattern BACKSLASH
           
private static Pattern DOUBLE_QUOTE
           
static Pattern EMAIL_REGEXP
          This pattern can be used to find email addresses.
private static Pattern GT
           
private static Pattern HASH
           
private static Pattern IMAGE_EXTENSION
           
private static Pattern LEADING_TRAILING_LINEBREAKS
           
static Pattern LINEBREAKS_REGEXP
          This pattern can be used to find line breaks.
static int LINK_EMAIL
          This flag is used in scanForLinks(String,int) when you want to create links for email addresses.
static int LINK_URL
          This flag is used in scanForLinks(String,int) when you want to create links for URL:s.
private static Pattern LT
           
static Pattern MARKUP
          Pattern that matches everything inside a HTML tag.
private static Pattern NEWLINE
           
private static Pattern PERCENT
           
private static Pattern PLUS
           
private static Pattern QUOTE
           
static Pattern SAFE_TAGS
          This pattern is a list of HTML tags considered "safe".
private static Pattern SINGLE_QUOTE
           
static int SMART_IMAGES
          This flag is used in scanForLinks(String,int) when you want to create links for images with a <img> tag instead of an <a> tag.
private static Pattern SPACE
           
static Pattern TAG_REGEXP
          This pattern can be used to fins HTML tags.
static Pattern URL_REGEXP
          This pattern can be used to find URL:s.
 
Constructor Summary
HTML()
           
 
Method Summary
static String encodeTags(String in)
          Scans a string for HTML tags and replaces all & with &amp;, < with &lt;, > with &gt; and all " with "
static String encodeTags(String in, Pattern safeTags)
           
static String encodeTags(String in, String safeTags)
           
static String formatLineBreaks(String in)
          Finds all linebreaks in a string and replaces them with a <br> tag, except that leading and trailing linebreaks will be removed.
static boolean isValidEmail(String email)
          Checks if the given string looks like an email address.
static boolean isValidUrl(String url)
          Checks if the given string looks like an URL.
static String javaScriptEncode(String in)
          Escape a string to make it safe for use in a JavaScript statement.
static String niceFormat(String in)
          A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, "_new") and formatLineBreaks(in) in a single operation.
static String niceFormat(String in, int flags)
          A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, flags, "_new") and formatLineBreaks(in) in a single operation.
static String niceFormat(String in, int flags, String linkTarget)
          A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, flags, link_target) and formatLineBreaks(in) in a single operation.
static String niceFormat(String in, String linkTarget)
          A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, target) and formatLineBreaks(in) in a single operation.
static String scanForLinks(String in)
          Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.
static String scanForLinks(String in, int flags)
          Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.
static String scanForLinks(String in, int flags, String target)
          Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.
static String scanForLinks(String in, String target)
          Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.
static String stripMarkup(CharSequence in)
          Remove all HTML markup in a string and return what is left.
static int textLength(CharSequence html)
          Counts the length of a string ignoring all characters in HTML markup tags.
static String urlEncode(String in)
          Encode URL-unsafe characters in a string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LINK_URL

public static final int LINK_URL
This flag is used in scanForLinks(String,int) when you want to create links for URL:s.

See Also:
Constant Field Values

LINK_EMAIL

public static final int LINK_EMAIL
This flag is used in scanForLinks(String,int) when you want to create links for email addresses.

See Also:
Constant Field Values

SMART_IMAGES

public static final int SMART_IMAGES
This flag is used in scanForLinks(String,int) when you want to create links for images with a <img> tag instead of an <a> tag.

See Also:
Constant Field Values

EMAIL_REGEXP

public static final Pattern EMAIL_REGEXP
This pattern can be used to find email addresses. The pattern will check that an @ symbol is present and that it is preceded with at least one character and followed by at least one subdomain and one topdomain. The pattern will allow any unicode letters, digits, underscore and hyphen in the address. It will not check that the domain or email address actually exists.


URL_REGEXP

public static final Pattern URL_REGEXP
This pattern can be used to find URL:s. It will look for sequences starting with http://, https://, ftp:// or www. followed by at least one subdomain and one topdomain followed by an optional port number and an optional path including query information.


LINEBREAKS_REGEXP

public static final Pattern LINEBREAKS_REGEXP
This pattern can be used to find line breaks. It will match any combination of carrige return and linefeed characters as well as some unicode line separator characters.


TAG_REGEXP

public static final Pattern TAG_REGEXP
This pattern can be used to fins HTML tags. It will match both start and end tags. The entire tag with attributes are put in the $1 group, the tag name in the $2 group and the attributes in $3.


SAFE_TAGS

public static final Pattern SAFE_TAGS
This pattern is a list of HTML tags considered "safe".


MARKUP

public static final Pattern MARKUP
Pattern that matches everything inside a HTML tag.

Since:
2.10

LEADING_TRAILING_LINEBREAKS

private static final Pattern LEADING_TRAILING_LINEBREAKS

AT_OR_DOT

private static final Pattern AT_OR_DOT

IMAGE_EXTENSION

private static final Pattern IMAGE_EXTENSION

AMP

private static final Pattern AMP

LT

private static final Pattern LT

GT

private static final Pattern GT

QUOTE

private static final Pattern QUOTE

PERCENT

private static final Pattern PERCENT

PLUS

private static final Pattern PLUS

SPACE

private static final Pattern SPACE

HASH

private static final Pattern HASH

AMPERSAND

private static final Pattern AMPERSAND

BACKSLASH

private static final Pattern BACKSLASH

NEWLINE

private static final Pattern NEWLINE

SINGLE_QUOTE

private static final Pattern SINGLE_QUOTE

DOUBLE_QUOTE

private static final Pattern DOUBLE_QUOTE
Constructor Detail

HTML

public HTML()
Method Detail

isValidEmail

public static boolean isValidEmail(String email)
Checks if the given string looks like an email address. This is done by trying to match it against the EMAIL_REGEXP pattern.

Parameters:
email - The string to check
Returns:
TRUE or FALSE
See Also:
EMAIL_REGEXP

isValidUrl

public static boolean isValidUrl(String url)
Checks if the given string looks like an URL. This is done by trying to match it against the URL_REGEXP pattern.

Parameters:
url - The string to check
Returns:
TRUE or FALSE
See Also:
URL_REGEXP

formatLineBreaks

public static String formatLineBreaks(String in)
Finds all linebreaks in a string and replaces them with a <br> tag, except that leading and trailing linebreaks will be removed.

Parameters:
in - The string to search
Returns:
The new string, or an empty string if NULL was passed

scanForLinks

public static String scanForLinks(String in)
Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, LINK_URL+LINK_EMAIL+SMART_IMAGES, null);

Parameters:
in - The string to search
Returns:
The new string, or an empty string if NULL was passed

scanForLinks

public static String scanForLinks(String in,
                                  int flags)
Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, flags, null);

Parameters:
in - The string to search
flags - A combination of the following flags to indicate what we should search for: LINK_URL, LINK_EMAIL SMART_IMAGES
Returns:
The new string, or an empty string if NULL was passed

scanForLinks

public static String scanForLinks(String in,
                                  String target)
Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, LINK_URL+LINK_EMAIL+SMART_IMAGES, target);

Parameters:
in - The string to search
target - The name of the target window in which the link should be opened
Returns:
The new string, or an empty string if NULL was passed

scanForLinks

public static String scanForLinks(String in,
                                  int flags,
                                  String target)
Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.

Parameters:
in - The string to search
flags - A combination of the following flags to indicate what we should search for: LINK_URL, LINK_EMAIL SMART_IMAGES
target - The name of the target window in which the link should be opened
Returns:
The new string, or an empty string if NULL was passed

encodeTags

public static String encodeTags(String in)
Scans a string for HTML tags and replaces all & with &amp;, < with &lt;, > with &gt; and all " with "

Parameters:
in - The string to search
Returns:
The new string, or an empty string if NULL was passed

encodeTags

public static String encodeTags(String in,
                                String safeTags)

encodeTags

public static String encodeTags(String in,
                                Pattern safeTags)

urlEncode

public static String urlEncode(String in)
Encode URL-unsafe characters in a string. Replaces % with %25, + with %2B, space with +, # with %23 and & with %26.

Parameters:
in - The string to encode
Returns:
The encoded string, or an empty string if NULL was passed

javaScriptEncode

public static String javaScriptEncode(String in)
Escape a string to make it safe for use in a JavaScript statement. Replaces \ with \\, newline with \n, ' with \' and " with \".

Parameters:
in - String to escape.
Returns:
a String object, ready to be used in javaScripts.

niceFormat

public static String niceFormat(String in)
A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, "_new") and formatLineBreaks(in) in a single operation.

Parameters:
in - The string to format
Returns:
The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed

niceFormat

public static String niceFormat(String in,
                                int flags)
A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, flags, "_new") and formatLineBreaks(in) in a single operation.

Parameters:
in - The string to format
flags - Flags to be used in the call to scanForLinks
Returns:
The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed

niceFormat

public static String niceFormat(String in,
                                String linkTarget)
A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, target) and formatLineBreaks(in) in a single operation.

Parameters:
in - The string to format
linkTarget - The target parameter to be used in the call to scanForLinks(String,String)
Returns:
The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed

niceFormat

public static String niceFormat(String in,
                                int flags,
                                String linkTarget)
A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, flags, link_target) and formatLineBreaks(in) in a single operation.

Parameters:
in - The string to format
flags - Flags to be used in the call to scanForLinks
linkTarget - The target parameter to be used in the call to scanForLinks(String,int,String)
Returns:
The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed

stripMarkup

public static String stripMarkup(CharSequence in)
Remove all HTML markup in a string and return what is left.

Parameters:
in - The string to strip from HTML
Returns:
The resulting string, or null if the input is null
Since:
2.10

textLength

public static int textLength(CharSequence html)
Counts the length of a string ignoring all characters in HTML markup tags. The result is the approximately the length of the string that is displayed on screen by a browser. This method counts all characters that are not inside < and >. The actual number of characters displayed by a browser may be less because of escaped sequences, eg. &amp; and white-space that is collapsed.

Parameters:
html - The HTML string
Returns:
The number of characters that are not HTML markup
Since:
2.10

2.17.2: 2011-06-17