Sortable spreadsheet view

From KitwarePublic
Revision as of 21:45, 16 September 2010 by Sebastien.jourdain (talk | contribs) (Created page with "== Introduction == The spreadsheet view allow the user to browse the data in an Excel way. But the missing feature was the sorting in the sense that the user was not able to fin...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

The spreadsheet view allow the user to browse the data in an Excel way. But the missing feature was the sorting in the sense that the user was not able to find quickly a given value hidden inside the table. Now even when the data evolve through time, we can keep the table sorted to see how the other values behave. The sorting is working on vector and scalar value. On vector the sorting is performed on the magnitude. Therefore if you try to sort the point coordinate of the default sphere source, the order won't change since each point coordinate are supposed to have a magnitude equal to the radius of the sphere which is constant regardless the sphere point.

How to use it

Like any spreadsheet view that have column sorting capability, you just need to click on the column name that you want to sort and click again if you want to revert the sorting order. But be careful changing the sorting order of the same column imply a sorting computation which is not the case once you start browsing the data.

Special behavior when sorting on Process ID

As process ID is not an available input of the filter, we do not sort the table at all and we just keep them in the current order. Therefore, if you sort by NodeId and then you sort by ProcessId, then the table will be first sorted by ProcessId but also by NodeId for each given process.

How it works

Based on the initial implementation where the client just ask the server a chunk of data between two row indexes. We developed a distributed algorithm that extract on each process which subset is inside that chunk and only one process do the merging of each process chunk-subset and send it to the client. In the case of only one process, the sorting is done locally and the chunk building is trivial. But in the distributed case, there is a set of operation that needs to be done. The following actions list what each process is doing.

Process action

    • Sort the column locally with the local data
    • Find out the common value range among all the processes
    • Build a local histogram based on the global range
    • Based on the requested global row index, find the local data index by exchanging and reducing histogram range with some error tolerance in the requested index
    • Find which process has the biggest chunk size, it will be the merger process
      • If NOT merger process
        • just send your local subset chunk to the merger process
        • return an empty table
      • If merger process
        • merge processes chunk into a single table.
        • sort again
        • Trim head and tail based on the error tolerance that we get while finding the indexes among the processes
        • return the given table

FAQ

Q: Why this features was not there before and managed by Qt ? A: The reason, is that the table size can be so huge that it won't fit in the client memory. Therefore, it was not possible to make a local sorting. And in fact, only chunks of the table are requested from the server and silently integrated into the view based on which row the user want to see.

Q: Can it work on huge distributed dataset ? A: Yes, and it has been optimized to reduce the data processing and the data communication.