[Prévia] [Próxima] [Prévia por assunto] [Próxima por assunto]
[Índice cronológico] [Índice de assunto]

forwarded message from JDC Tech Tips



Uma mensagem para os fanáticos por Java ou para aqueles que querem ficar
realmente especialistas nessa linguagem:

A Sun tem um serviço chamado JDC Tech Tips que envia 1 mensagem por mes
contendo 2 dicas interessantes sobre a linguagem Java. Para se inscrever basta
visitar a página que está em algum dos links aí em baixo. Eu recebo isso já há
alguns anos e nesse tempo já vi muita coisa interessante. Hoje em dia, devido
a falta de tempo eu quase sempre acabo apagando a mensagem depois de ler as
primeiras linhas :-|

[]s,
Fabio.

-- BEGIN included message

Title: Core Java Technologies Technical Tips
image
image
Core Java Technologies Technical Tips
image
   View this issue as simple text September 10, 2002    

In this Issue

WELCOME to the Core JavaTM Technologies Tech Tips, September 10, 2002. Here you'll get tips on using core Java technologies and APIs, such as those in Java 2 Platform, Standard Edition (J2SETM).

Using ArrayList and LinkedList
Using Zero-Length Arrays

These tips were developed using Java 2 SDK, Standard Edition, v 1.4.

This issue of the JDC Tech Tips is written by Glen McCluskey.

Pixel

USING ARRAYLIST AND LINKEDLIST

ArrayList and LinkedList are two Collections classes used for storing lists of object references. For example, you could have an ArrayList of Strings, or a LinkedList of Integers. This tip compares the performance of ArrayList and LinkedList, and offers some suggestions about which of these classes is the right choice in a given situation.

The first key point is that an ArrayList is backed by a primitive Object array. Because of that, an ArrayList is much faster than a LinkedList for random access, that is, when accessing arbitrary list elements using the get method. Note that the get method is implemented for LinkedLists, but it requires a sequential scan from the front or back of the list. This scan is very slow. For a LinkedList, there's no fast way to access the Nth element of the list.

Consider the following example. Suppose you have a large list of sorted elements, either an ArrayList or a LinkedList. Suppose too that you do a binary search on the list. The standard binary search algorithm starts by checking the search key against the value in the middle of the list. If the middle value is too high, then the upper half of the list is eliminated. However, if the middle value is too low, then the lower half of the list is ignored. This process continues until the key is found in the list, or until the lower bound of the search becomes greater than the upper bound.

Here's a program that does a binary search on all the elements in an ArrayList or a LinkedList:

    import java.util.*;

    public class ListDemo1 {
        static final int N = 10000;

        static List values;

        // make List of increasing Integer values

        static {
           Integer vals[] = new Integer[N];

           Random rn = new Random();

           for (int i = 0, currval = 0; i < N; i++) {
               vals[i] = new Integer(currval);
               currval += rn.nextInt(100) + 1;
           }

           values = Arrays.asList(vals);
        }
        
        // iterate across a list and look up every
        // value in the list using binary search

        static long timeList(List lst) {
            long start = System.currentTimeMillis();

            for (int i = 0; i < N; i++) {

               // look up a value in the list 
               // using binary search

               int indx = Collections.binarySearch(
                                   lst, values.get(i));

               // sanity check for result 
               // of binary search

               if (indx != i) {
                   System.out.println(
                                    "*** error ***\n");
               }
           }

           return System.currentTimeMillis() - start;
       }

       public static void main(String args[]) {

           // do lookups in an ArrayList 

           System.out.println("time for ArrayList = " +
               timeList(new ArrayList(values)));

           // do lookups in a LinkedList

           System.out.println(
               "time for LinkedList = " +
               timeList(new LinkedList(values)));
       }
    }

The ListDemo1 program sets up a List of sorted Integer values. It then adds the values to an ArrayList or a LinkedList. Then Collections.binarySearch is used to search for each value in the list.

When you run this program, you should see a result that looks something like this:

    time for ArrayList = 31

    time for LinkedList = 4640

ArrayList is about 150 times faster than LinkedList. (Your results might differ depending on your machine characteristics, but you should see a distinct difference in the result for ArrayList as compared to that for LinkedList. The same is true for the other programs in this tip.) Clearly, LinkedList is a bad choice in this situation. The binary search algorithm inherently uses random access, and LinkedList does not support fast random access. The time to do a random access in a LinkedList is proportional to the size of the list. By comparison, random access in an ArrayList has a fixed time.

You can use the RandomAccess marker interface to check whether a List supports fast random access:

    void f(List lst) {
        if (lst instanceof RandomAccess) {
            // supports fast random access
        }
    }

ArrayList implements the RandomAccess interface, and LinkedList. does not. Note that Collections.binarySearch does take advantage of the RandomAccess property, to optimize searches.

Do these results prove that ArrayList is always a better choice? Not necessarily. There are many cases where LinkedList does better. Also note that there are many situations where an algorithm can be implemented efficiently for LinkedList. An example is reversing a LinkedList using Collections.reverse. The internal algorithm does this, and gets reasonable performance, by using forward and backward iterators.

Let's look at another example. Suppose you have a list of elements, and you do a lot of element inserting and deleting to the list. In this case, LinkedList is the better choice. To demonstrate that, consider the following "worst case" scenario. In this demo, a program repeatedly inserts elements at the beginning of a list. The code looks like this:

    import java.util.*;

    public class ListDemo2 {
        static final int N = 50000;

        // time how long it takes to add 
        // N objects to a list

        static long timeList(List lst) {
            long start = System.currentTimeMillis();

            Object obj = new Object();

            for (int i = 0; i < N; i++) {
                lst.add(0, obj);
            }

            return System.currentTimeMillis() - start;
        }

        public static void main(String args[]) {

            // do timing for ArrayList

            System.out.println(
                "time for ArrayList = " +
                timeList(new ArrayList()));

            // do timing for LinkedList

            System.out.println(
                "time for LinkedList = " +
                timeList(new LinkedList()));
        }
    } 

When you run this program, the result should look something like this:

    time for ArrayList = 4859

    time for LinkedList = 125

These results are pretty much the reverse of the previous example.

When an element is added to the beginning of an ArrayList, all of the existing elements must be pushed back, which means a lot of expensive data movement and copying. By contrast, adding an element to the beginning of a LinkedList simply means allocating an internal record for the element and then adjusting a couple of links. Adding to the beginning of a LinkedList has fixed cost, but adding to the beginning of an ArrayList has a cost that's proportional to the list size.

So far, this tip has looked at speed issues, but what about space? Let's look at some internal details of how ArrayList and LinkedList are implemented in Java 2 SDK, Standard Edition v 1.4. These details are not part of the external specification of these classes, but are illustrative of how such classes work internally.

The LinkedList class has a private internal class defined like this:

    private static class Entry {
        Object element;
        Entry next;
        Entry previous;
    } 

Each Entry object references a list element, along with the next and previous elements in the LinkedList -- in other words, a doubly-linked list. A LinkedList of 1000 elements will have 1000 Entry objects linked together, referencing the actual list elements. There is significant space overhead in a LinkedList structure, given all these Entry objects.

An ArrayList has a backing Object array to store the elements. This array starts with a capacity of 10. When the array needs to grow, the new capacity is computed as:

    newCapacity = (oldCapacity * 3) / 2 + 1;

Notice that the array capacity grows each time by about 50%. This means that if you have an ArrayList with a large number of elements, there will be a significant amount of space wasted at the end. This waste is intrinsic to the way ArrayList works. If there was no spare capacity, the array would have to be reallocated for each new element, and performance would suffer dramatically. Changing the growth strategy to be more aggressive (such as doubling the size at each reallocation) would result in slightly better performance, but it would waste more space.

If you know how many elements will be in an ArrayList, you can specify the capacity to the constructor. You can also call the trimToSize method after the fact to reallocate the internal array. This gets rid of the wasted space.

So far, this discussion has assumed that either an ArrayList or a LinkedList is "right" for a given application. But sometimes, other choices make more sense. For example, consider the very common situation where you have a list of key/value pairs, and you would like to retrieve a value for a given key.

You could store the pairs in an N x 2 Object array. To find the right pair, you could do a sequential search on the key values. This approach works, and is a useful choice for very small lists (say 10 elements or less), but it doesn't scale to big lists.

Another approach is to sort the key/value pairs by ascending key value, store the result in a pair of ArrayLists, and then do a binary search on the keys list. This approach also works, and is very fast. Yet another approach is to not use a list structure at all, but instead use a map structure (hash table), in the form of a HashMap.

Which is faster, a binary search on an ArrayList, or a HashMap? Here's a final example that compares these two:

    import java.util.*;

    public class ListDemo3 {
        static final int N = 500000;

        // Lists of keys and values

        static List keys;
        static List values;

        // fill the keys list with ascending order key 
        // values and fill the values list with
        // corresponding values (-key)

        static {
            Integer keyvec[] = new Integer[N];
            Integer valuevec[] = new Integer[N];

            Random rn = new Random();

            for (int i = 0, currval = 0; i < N; i++) {
                keyvec[i] = new Integer(currval);
                valuevec[i] = new Integer(-currval);
                currval += rn.nextInt(100) + 1;
            }

            keys = Arrays.asList(keyvec);
            values = Arrays.asList(valuevec);
        }

        // fill a Map with key/value pairs

        static Map map = new HashMap();

        static {
            for (int i = 0; i < N; i++) {
                map.put(keys.get(i), values.get(i));
            }
        }

        // do binary search lookup of all keys

        static long timeList() {
            long start = System.currentTimeMillis();

            for (int i = 0; i < N; i++) {
                int indx = Collections.binarySearch(
                                    keys, keys.get(i));

                // sanity check of returned value 
                // from binary search

                if (indx != i) {
                    System.out.println(
                                    "*** error ***\n");
                }
            }

            return System.currentTimeMillis() - start;
        }

        // do Map lookup of all keys

        static long timeMap() {
            long start = System.currentTimeMillis();

            for (int i = 0; i < N; i++) {
                Integer value = (Integer)map.get(
                                          keys.get(i));

                // sanity check of value returned 
                // from map lookup

                if (value != values.get(i)) {
                    System.out.println(
                                    "*** error ***\n");
                }
            }

            return System.currentTimeMillis() - start;
        }

        public static void main(String args[]) {

            // do timing for List implementation

            System.out.println("List time = " + 
                                           timeList());

            // do timing for Map implementation

            System.out.println("Map time = " + 
                                            timeMap());
        }
    }

The program sets up Lists of keys and values, and then uses two different techniques to map keys to values. One approach uses a binary search on a list, the other a hash table.

When you run the ListDemo3 program, you should get a result that looks something like this:

    ArrayList time = 1000

    HashMap time = 281

In this example, N has a value of 500000. Approximately, log2(N) - 1 comparisons are required in an average successful binary search, so each binary search lookup in the ArrayList will take about 18 comparisons. By contrast, a properly implemented hash table typically requires only 1-3 comparisons. So you should expect the hash table to be faster in this case.

However, binary search is still useful. For example, you might want to do a lookup in a sorted list and then find keys that are close in value to the key used for the lookup. Doing this is easy with binary search, but impossible in a hash table. Keys in a hash table are stored in apparent random order. Also, if you are concerned with worst-case performance, the binary search algorithm offers a much stronger performance guarantee than a hash table scheme. You might also consider using TreeMap for doing lookups in sorted collections of key/value pairs.

Let's summarize the key points presented in this tip:

  • Appending elements to the end of a list has a fixed averaged cost for both ArrayList and LinkedList. For ArrayList, appending typically involves setting an internal array location to the element reference, but occasionally results in the array being reallocated. For LinkedList, the cost is uniform and involves allocating an internal Entry object.
  • Inserting or deleting elements in the middle of an ArrayList implies that the rest of the list must be moved. Inserting or deleting elements in the middle of a LinkedList has fixed cost.
  • A LinkedList does not support efficient random access
  • An ArrayList has space overhead in the form of reserve capacity at the end of the list. A LinkedList has significant space overhead per element.
  • Sometimes a Map structure is a better choice than a List.

For more information about Using ArrayList and LinkedLists, see section 16.6, List, and section 16.7, Map and SortedMap, in "The JavaTM Programming Language Third Edition" by Arnold, Gosling, and Holmes. Also see the Collections trail in the Java Tutorial, Third Edition.

Pixel
Pixel

USING ZERO-LENGTH ARRAYS

Suppose that you are writing a Java application that involves some sort of data filtering. You have some raw data that you want to clean up and process in various ways. One of the methods you've written takes an array of integers, along with minimum and maximum values. The method goes through the array and eliminates "outliers," that is, values that are too small or too big. The result of the filtering process is a new array of cleaned-up data.

How could you implement this kind of filtering method? Here's one approach:

    import java.util.*;
    
    public class ZeroDemo1 {
    
        // filter input array and throw away values 
        // that are less than minval or greater than 
        // maxval
    
        static int[] filterData(
                int indata[], int minval, int maxval) {
    
            // check parameters for errors 
    
            if (indata == null) {
                throw new NullPointerException(
                                     "indata is null");
            }
            if (maxval < minval) {
                throw new IllegalArgumentException(
                "maxval < minval");
            }
    
            // count number of valid values 
            // in input array
    
            int validcnt = 0;
            for (int i = 0; i < indata.length; i++) {
                if (indata[i] >= minval && indata[i] 
                                         <= maxval) {
                    validcnt++;
                }
            }
    
            // if no valid values, return null
    
            if (validcnt == 0) {
                return null;
            }
    
            // copy valid values to new array 
            // and return it
    
            int outdata[] = new int[validcnt];
            for (int i = 0, j = 0; 
                              i < indata.length; i++) {
                if (indata[i] >= minval && indata[i] 
                                           <= maxval) {
                    outdata[j++] = indata[i];
                }
            }
            return outdata;
        }
    
        public static void main(String args[]) {
    
            // set up test array of integers
    
            int indata[] = new int[]{1, 3, -17, 8, 59};
    
            // filter out values not in the range 1-10
    
            int outdata1[] = filterData(indata, 1, 10);
            for (int i = 0; i < outdata1.length; i++) {
                System.out.println(outdata1[i]);
            }
    
            // filter out values not 
            // in the range 100-200
    
            int outdata2[] = filterData(
                                     indata, 100, 200);
            for (int i = 0; i < outdata2.length; i++) {
                System.out.println(outdata2[i]);
            }
        }
    }

The filterData method does two scans of the input array. The first scan counts the number of valid data values. Then the method allocates a new array of the appropriate size, and copies the good values to it. If there are no good values, the method returns a null value for the array reference.

The result of running the ZeroDemo1 program is:

    1
    3
    8
    Exception in thread "main"
    java.lang.NullPointerException
        at ZeroDemo1.main(ZeroDemo1.java:72)

The problem with this program is a rather basic one. The second call of filterData returns a null value, and the program fails to take this possibility into account.

A better approach in this example would be to comment out the block of code that tests for the possibility of no valid data values:

    /*
    if (validcnt == 0) {
        return null;
    }
    */

When there is no valid data, the code will fall through to the next line, where a zero-length array is allocated:

    int outdata[] = new int[0];

This is perfectly legal Java usage. The representation of Java arrays includes the length of the array, and it's therefore possible to tell if an array has zero length.

For the ZeroDemo1 example, if you anticipate that validcnt will often be zero, that is, data filtering will often eliminate all values from the input, then you could optimize away the second input scan by adding code like this:

    int outdata[] = new int[validcnt];
    if (validcnt == 0) {
        return outdata;
    }

Note that usage such as:

    int outdata[] = new int[]{};

is also legal, to initialize an array with a zero-length set of integer constants.

In general, it's best not to return null from a method that returns an array type. Always returning an array, even if the array has zero length, greatly improves the generality of algorithms. If you anticipate that your methods will often return zero-length arrays, you might be concerned about the performance implications of allocating many such arrays. In this case, you can allocate a single array, and always return the same one, as follows:

    private static final int ZERO_LENGTH_ARRAY[] = 
                                            new int[0];

This array is immutable (it can't be changed), and can be shared throughout your application.

There's another way that zero-length arrays are used, as illustrated in the following example:

    import java.util.*;
    
    public class ZeroDemo2 {
        public static void main(String args[]) {
    
            // set up ArrayList and add strings to it
    
            List stringlist = new ArrayList();
            stringlist.add("string 1");
            stringlist.add("string 2");
            stringlist.add("string 3");
    
            // convert to String array
    
            String out[] = (
                          String[])stringlist.toArray(
            new String[0]);
            for (int i = 0; i < out.length; i++) {
                System.out.println(out[i]);
            }
        }
    }

The result of running the ZeroDemo2 program is:

    string 1
    string 2
    string 3

The ZeroDemo2 program sets up an ArrayList, and adds three strings to it. Then the program calls toArray to get a String array of the three elements in the ArrayList. In this example, the argument to toArray is "new String[0]". This argument serves a couple of purposes. First, if you have a String array that's big enough to hold the elements of the ArrayList, you can specify it as the argument to toArray. The method will use the String array.

But if your array is not big enough, then the toArray method allocates an array to return the elements. It uses the type of the array that was passed in to determine the type of the allocated array. ArrayList stores its element references in an Object array. The toArray method needs to be told if there is some other type (such as String) to be used for returning the array of elements. The method uses reflection (java.lang.reflect.Array.newInstance) to create an array of the appropriate type.

For more information about using zero-Length arrays, see item 27 "Return zero-length arrays, not nulls" in "Effective Java Programming Language Guide" by Joshua Bloch.

Pixel
Pixel

IMPORTANT: Please read our Terms of Use, Privacy, and Licensing policies:
http://www.sun.com/share/text/termsofuse.html
http://www.sun.com/privacy/
http://developer.java.sun.com/berkeley_license.html


Comments? Send your feedback on the JavaTM Technology Fundamentals Newsletter to: jdc-webmaster@sun.com

Subscribe to the following newsletters for the latest information about technologies and products in other Java platforms:

- Enterprise Java Technologies Tech Tips. Get tips on using enterprise Java technologies and APIs, such as those in the Java 2 Platform, Enterprise Edition (J2EETM).
- Wireless Developer Tech Tips. Get tips on using wireless Java technologies and APIs, such as those in the Java 2 Platform, Micro Edition (J2METM).

To subscribe to these and other JDC publications:
- Go to the JDC Newsletters and Publications page, choose the newsletters you want to subscribe to and click "Update".
- To unsubscribe, go to the subscriptions page, uncheck the appropriate checkbox, and click "Update".


ARCHIVES: You'll find the Core Java Technologies Tech Tips archives at:
http://java.sun.com/jdc/TechTips/index.html


Copyright 2002 Sun Microsystems, Inc. All rights reserved. 901 San Antonio Road, Palo Alto, California 94303 USA.

Sun, Sun Microsystems, Java, Java Developer Connection, J2SE, J2EE, and J2ME are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

Sun Microsystems, Inc.

-- END included message