The importance of open data

I’ve been thinking for a while about writing about the importance of open data, but is with the advertising given to Google Map Maker when I really understood the urgency of the matter.

Can you imagine a country with so poor geographic data that even the government doesn’t known which cities and towns do they have? How could they invest on roads, literacy, drinking water or even know that there are people who live there? How could they collect taxes or… count votes in elections!? Can you imagine that a battalion of soldiers use maps that are wrong and establish a base in the nearest country? An absurdity that happened recently on the border between Nicaragua and Costa Rica , which almost causes an international conflict.

Public Data

If institutions publish their data and leave it to free access, anyone can verify the accuracy of the data and may suggest changes or corrections. But while this data remains locked away in dusty archives, the same mistakes will be made over and over again. We are not talking about sensitive data or national security, we discuss data that anyone who is physically present at the location can check whether it is correct.

But it is important not only that the open data is freely available. It is also important to be free in their use. I gain nothing by looking at a map on page X of the Public Service if I can not use the data I am seeing. Seeing the traffic before you leave home can help you, but if my GPS can not use that information to guide me through the best path, it is useless.

Well, someone may say, if the source of the data (for example, the government) provide all services we will be needing, we don’t need a free use of the data. It is not enough. Why? Because open data may have myriad of uses. It is a newly opened market to explore.

Private Map Providers

But, how does it benefit the private map provider? Are we suggesting to have data servers and offer free data without charging for its use? Is it the culture of all free? Of course not, nobody in their right mind would ever ask for this. The private provider can get great benefits releasing their data (others than charging for services based on this data) :

The first benefit is straightforward: if you manage a large community, the cost of renovation and expansion of their data will be greatly reduced. Vendors like TomTom or Nokia begin to understand the importance of these updates from their own users. OpenStreetMap is another clear example and direct the power of users: a source of geographic data that can compete (and win) on Google Maps or Bing created entirely and only by a combination of free data supplied by its users.

The second advantage is perhaps more complex to understand because it is not so straightforward. Ignoring all the classic advantages of freedom, there is still one more: You can always charge for commercial or intensively use. Although it does not benefit you at the beginning, if your data is good enough, sooner or later someone will think of some utility .

Google Map Maker

Some hustlers will have, at this point, if this is not what Google Map Maker does. Do they not collect updates of their users, giving them maps for free and charging only for intensive or commercial? No. To begin with, data isn’t free. This means that if you collaborate with Google Map Maker and update their maps and tomorrow you want to use these data to set up a commercial service, you couldn’t do it without going through a convoluted series of licenses. However, if instead of working with Google Map Maker, you contributed with a free platform for geographic data, you will be able to use this data on your service without problems.

Does this mean that I think Google Map Maker is useless? Neither. Probably someone will find a good use. But whatever the intended use, you can always get at least the same functionality with OpenLayers , OpenStreetMap data and free PNOA and the Cadastre (recently released). So why use an exclusive platform when you can use a free platform much more powerful?

But Google is good, someone may say, it offers free, quality data. Sure, and no doubt. But never forget that Google, beyond any good intentions, remains a business. And finally, the top priority of a company is to generate business to survive. And if Google has to change its way, to get ride of free offerings that are inconsistent with their business, they will. In fact, they already do it .

High Concurrency

When facing high concurrency applications, we often find a number of generic problems. In this article I will focus on the problems of resources (CPU and memory). For now on, I will focus on the most typical and most direct solutions.

When we discover threads and the advantages of parallel processing it can happen that we end up abusing their use. We have a lot of threads (100 ¿? 1000?) simultaneously, and the processor will be jumping from one to another without stopping, not letting them finish, no matter how fast is their real excution. And over time there will be more and more threads only slowing down the process. To the cost of execution of each thread, we must consider also the added cost of creating and destroying threads. It can can become significant when we talk about so many threads at once.

High Concurrency with the Thread Pool Pattern
High Concurrency with the Thread Pool Pattern

Threads: the holy grail

In this case, the first method that we think of is the Thread Pool Pattern . This pattern will limit the number of threads running at the same time.
Instead of creating new threads, we create tasks, which are piled. Also, we have a pool of threads that will work picking these up and running as soon as possible. A classic example of this thread can be found on SwingWorker. If we want to implement bare hands our own pattern, we should take a look at the interface ExecutorService.

If you have a background thread that is making heavy use of processor, but we do not mind slowing it down for performance, we can use the command sleep ( Thread.sleep (...)) to periodically release the thread processor, allowing other threads to run faster .

This is useful for threads running in maintenance mode, which must be kept running but do not have to respond in real time. Another way to temporarily stop a running thread while another is using the method join ( Thread.Join () ), which makes a thread wait until another thread ends. Although more useful if we have a clearly higher priority thread than another, it is not viable if we can not have a reference to a higher priority thread from the lowest priority to tell which thread has to wait.

High Concurrency issues

But the high turnout is not given only by the use of the processor. It may be that multiple threads need access to large amounts of information almost simultaneously. These threads will not only be repeating the information in memory but often will be repeating the entire process of extracting that information.

This problem is usually solved in the majority of data access libraries (mostly database). For example, we have ehcache , which uses threads to store information ( Thread-Specific Storage Pattern ). This way, access and storage of this information is shared. Thus decreasing both the memory usage required and the processor time required to extract and shape information. As the threads wants to process this information, they will be asking ehcache for the data, which will optimize these hits.

To improve this solution have the concurrent collections. This allow different threads to use the same objects without any problems of concurrency.

There are more solutions to improve the high turnout (without going into optimizations to the code itself). But those described here are usually good ideas to start.

Useful References:

Easy map on Java

Sometimes you don’t know where to start when you enter the world of GIS programming. Too many libraries, IDEs, but the truth is, everyone assumes you already have a base and everything become chaos. Something is easy as how to develop a map on Java has scarce documentation.

If you have absolutely no idea of GIS, I would recommend you start by the Free book of Free GIS by Victor Olaya.

For beginners I would recommend that you take a look at a fairly new project aimed at extending Swing (the default graphics java library) with geographical widgets. In this way, add a map to a Java desktop application would be a task as simple as adding a button or text field.

Of course, GIS applications have some complexity, a simple display like this is not enough. But it is a good starting point to get familiar with what a map is and what can a developer do.

We start with a Java project and add SwingX-WS to its dependencies. Then, the following code would show a window with a simple map:

es.emergya.gis.examples package;

import java.awt.BorderLayout;

public class  SwingWS {

  public static void main (String [] args) {
    Form = new JFrame JFrame ("Map");

    JXMapKit JXMapKit jXMapKit1 = new ();
    jXMapKit1.setDefaultProvider (org.jdesktop.swingx.JXMapKit.DefaultProviders.OpenStreetMaps);
    jXMapKit1.setDataProviderCreditShown (true);
    jXMapKit1.setName ("jXMapKit1") / / NOI18N
    jXMapKit1.setAddressLocation(new GeoPosition(41.881944, 39.627778));

    form.getContentPane().add(jXMapKit1, BorderLayout.CENTER);

    form.pack();
    form.setVisible(true);
  }
}

The tiles of the maps drawn from OpenStreetMap , but is fully configurable for any WMS server.

So now you have your map on java.

en_GBEnglish (UK)