Getting started

The first thing to get going is to login to the datacollider. When you received your user account via email, we also indicated the URL at which you can login. Open that URL in your Google Chrome browser (currently we only support Chrome because of the advanced WebGL features we make use of) and login using your username and password. In case you forgot your password, you can reset it from the login page as well.

After you logged in, you’ll be taken to your dashboard – a welcome screen that gives you easy access to the most important things.

dashboard

From there you can create new projects, open your recent ones, access example projects and this help section. Lastly, we also show you the most recent posts from our blog where we write about recent developments and new features that we will make continuously available.

So, let’s get started! See the next guide to learn how to create a new project (hint: it’s super simple).

How does it work?

With the Datacollider, we tried creating data visualizations from big datasets as easy as possible. This also meant clearly structuring the workflow and reducing the number of steps that are required to complete a visualization. We brought it down to three steps:

  • Selecting the data
  • Processing the data
  • Visualizing the data

The three sections of the datacollider workflow can be accessed via the top menu bar.

The first you have to do when creating a new project is to select the dataset you want to actually create a visualization from. If your dataset represents events happening over time (say data about taxi trips happening in NYC in December 2013), then you will also have to specify the time range of the dataset you want to work with. More about this step can be found here.

Once this has been done, you can move to processing your data. This step allows you to filter, aggregate, group, annotate and map your data easily. You can learn how to do this here.

Lastly, after your data is in the shape you want it, you can move to the final step: creating your visualization. For that you’ll select a certain visualization type that suits your data and then select which fields from your dataset should be represented in the visualization. Read more about this last step here.

Creating a new project

In the datacollider, all your work will be organized as projects. It is pretty much up to you to decide what constitutes a project as you can use multiple datasets in one project and can create multiple visualizations as well.

But the best way to find this out is by simply trying it. You can create a new project from the dashboard by clicking on: ‘Start a NEW PROJECT’.

This will create your new project and load it for you. Initially the project will be completely empty. The first step will be to add a dataset to your project. This is why when the project is loaded, you will be shown the dataset selection page. Please see the Data Calendar guide to learn how to proceed from here.

Please note that if you like to use your own data, you will need to upload it prior to creating the project. See the next guide for instructions on how to do so.

Uploading your own dataset

The datacollider provides you with a straight-forward way to upload your own datasets. Uploaded datasets will only be available to you. You can upload your dataset either from the quick start section on dashboard page or the dataset page where.

Dashboard quick start

After you clicked on the upload dataset button you will be guided through a series of steps to prepare your dataset for the user in the datacollider. This is necessary because the tool requires all data to be in the same internal format. But don’t worry the steps are pretty straight-forward:

In the beginning you’ll be asked to choose which kind of dataset you would like to prepare. Usually this would be a temporal dataset, meaning a dataset where each record has a timestamp to connect a certain event happening to a certain time. We’ll assume a temporal dataset (some open NYC taxi data) is used in this short guide, however the process is similar for other kinds of data.

structuring1

First, you need to upload your raw dataset which. The data can be in a variety of formats, however each record needs to be on a separate line and all fields of a record must be separated by the same delimiter. In our case, we are using a comma separated file for the New York taxi dataset. If your dataset consists of multiple files, you should upload all of them now because they cannot be added to the dataset later.

structuring2

As a second step, you need to specify the file structure of your file. The first field asks you to pick the line number of header names, this simplifies the process later on where you would give a name to each field. If your dataset doesn’t have headers, you can leave this blank. The second and third line indicate the syntax of your file. The second option allows you to ignore lines starting with a certain prefix (such as ‘#’ for comments). The third parameter then specifies the field delimiter (for CSV this would be comma). Once you entered the parameters you can click the Parse sample data button to see whether your file was parsed correctly.

structuring4

On the third page, you’ll need to declare the details for each fields. These contains name, (optional) description, value range, data type and an optional alternative null value. Please see the screenshot below for an example. We defined the medallion ID as a field of type String, the trip time in seconds as a field of type long (which is basically a integer) and the latitude and longitude fields of type double (a floating point number).

structuring5

Since we are preparing a temporal dataset, we need to select the field in each record that represents the event time. In our case, we use the pickup date as the timestamp (however you can also select multiple fields for example if your time information is split in a date and time field).

Once you select the fields and pressed next, you will have to enter the format of your timestamp and the time zone your time is in. The description of the syntax for the date format can be found at the link specified. Please also not the example below to get a better understanding of how this works.

structuring6

structuring7

And that’s it. One you entered all this information, you can click on the start structuring button to start. The processing time depends on your data file size. We’ll send you an email when the dataset is ready to use. For small datasets, this should only take a few minutes. Once you got the email, log in back to the datacollider and you will see your new dataset in the list of datasets available to you.

structuring8


Still unclear with the guides? Write to us your questions or head over to our FAQs section.