Move db handling to greenDAO #206

Closed
opened 7 years ago by cpfeiffer · 5 comments
Owner

We should port our manual db handling code to http://greendao-orm.com/

We should port our manual db handling code to http://greendao-orm.com/
Owner

Now that there's greendao in master, I believe we should start discussing what we want from the schema.

The following describes a proposal for the new database schema. With @comments@ and ?questions?

General considerations:

  • While storing data we don't want to cast the raw data to known values
  • While reading data we generally want to get only known values
    @Eg: we want to save activityTypes which we are not aware of, but generally we want to show to the user only the one we know about (sleep, deep sleep, etc.)
  • While storing data we want to keep them as pristine as possible (i.e. model the sample storage tables after the incoming data)
  • While reading the data we want to have mappings to access this diverse data storage while getting uniform return values

User:
birthday, gender, height, weight
?do we want more then one user, if so: ID?

Device:
id,user_id,?shortname?,?system_id?,!TBD!
?a device is tied to the user (because of personal data), so it's sufficient to have the device as foreign key in the following tables.
If the same device is reused by another user in the future, it must get a new ID.?

Activity_dataprovider_type:
id,description @eg. HRM, steps, activity recognition, ...@
@this points to the different tables documented below, but the link logic cannot live at the DB layer unfortunately@

Device_activity_dataprovider_type_link:
device_id,activity_dataprovider_type_id
@N-to-N relationship between device and supported providers@

Data providers: @The schema of each dataprovider depends on the specific data type, possibly even by the device!
Where possible, they are abstracted to a MCD (e.g. steps per minute and HRM per minute)@

HRM_minute_average:
device_id,timestamp,HRM_value
@used by miband 1s, HRM straps (unsupported as of now), pebble 2 (unsupported as of now)@

steps_minute_average:
device_id,timestamp,steps_count
@used by miband*, pebble time *@

miband_specific_data_minute_average:
device_id,timestamp,activity_intensity,activity_type,...

pebble_specific_data_minute_average:
device_id,timestamp,orientation,VMC,light_intensity,...
@used by pebble time *@

miband_live_activity_measurement:
device_id,timestamp,?all_the_readings_serialized?

pebble_activity_data_overlay:
device_id,timestamp_start,timestamp_end,activity_type,...
@used by pebble time@

user_entered_activity_data_overlay:
device_id,timestamp_start,timestamp_end,...
@used to store the "corrections" the user enters into GB. Eg. sleep vs non-sleep, etc.@

Now that there's greendao in master, I believe we should start discussing what we want from the schema. ​ The following describes a proposal for the new database schema. With @comments@ and ?questions? General considerations: - While storing data we don't want to cast the raw data to known values - While reading data we generally want to get only known values @Eg: we want to save activityTypes which we are not aware of, but generally we want to show to the user only the one we know about (sleep, deep sleep, etc.) - While storing data we want to keep them as pristine as possible (i.e. model the sample storage tables after the incoming data) - While reading the data we want to have mappings to access this diverse data storage while getting uniform return values User: birthday, gender, height, weight ?do we want more then one user, if so: ID? Device: id,user_id,?shortname?,?system_id?,!TBD! ?a device is tied to the user (because of personal data), so it's sufficient to have the device as foreign key in the following tables. If the same device is reused by another user in the future, it _must_ get a new ID.? Activity_dataprovider_type: id,description @eg. HRM, steps, activity recognition, ...@ @this points to the different tables documented below, but the link logic cannot live at the DB layer unfortunately@ Device_activity_dataprovider_type_link: device_id,activity_dataprovider_type_id @N-to-N relationship between device and supported providers@ Data providers: @The schema of each dataprovider depends on the specific data type, possibly even by the device! Where possible, they are abstracted to a MCD (e.g. steps per minute and HRM per minute)@ HRM_minute_average: device_id,timestamp,HRM_value @used by miband 1s, HRM straps (unsupported as of now), pebble 2 (unsupported as of now)@ steps_minute_average: device_id,timestamp,steps_count @used by miband*, pebble time *@ miband_specific_data_minute_average: device_id,timestamp,activity_intensity,activity_type,... pebble_specific_data_minute_average: device_id,timestamp,orientation,VMC,light_intensity,... @used by pebble time *@ miband_live_activity_measurement: device_id,timestamp,?all_the_readings_serialized? pebble_activity_data_overlay: device_id,timestamp_start,timestamp_end,activity_type,... @used by pebble time@ user_entered_activity_data_overlay: device_id,timestamp_start,timestamp_end,... @used to store the "corrections" the user enters into GB. Eg. sleep vs non-sleep, etc.@
Poster
Owner

General remarks:

  • For some entities we differentiate between static data and volatile data.
    • static data: e.g. we consider the name of a user as static. Changing the name is possible, and there will be no further reference to the old name anywhere
    • volatile data: e.g. the height or weight of a user is volatile. Activity data is sensitive to the weight and height, so we make sure that activity data captured can always be related to the height and weight that the user had at that time.
    • We do that by saving volatile data in separate tables (e.g. UserAttributes and DeviceAttributes and giving the entries a validity date. As soon as any volatile data is changed, the old table row will be marked as invalid from now on and a new row will be created and marked as valid from now on.

Remarks regarding the tables

User done
🔸 Device -- done, except for the link to the user. ATM the user is linked to each sample, not the device. If we want to have it at the device, it should be considered volatile and go into DeviceAttributes, I think.
I don't quite get the Activity_dataprovider_type. Isn't this just an Activity_type?
why steps_minute_average It's the actual steps per minute, not an average value, no?
miband_live_activity_measurement I'd like to have a single table for all intensity values, a single table for all HR measurements and a single table for all steps. What's the reason for splitting them?

A final remark: I'm afraid that having so many device specific tables will make the implementation rather complex. I'd vote for having as few device specific tables as possible.

### General remarks: - For some entities we differentiate between static data and volatile data. - _static data_: e.g. we consider the name of a user as static. Changing the name is possible, and there will be no further reference to the old name anywhere - _volatile data_: e.g. the height or weight of a user is volatile. Activity data is sensitive to the weight and height, so we make sure that activity data captured can always be related to the height and weight that the user had at that time. - We do that by saving volatile data in separate tables (e.g. `UserAttributes` and `DeviceAttributes` and giving the entries a validity date. As soon as any volatile data is changed, the old table row will be marked as invalid from now on and a new row will be created and marked as valid from now on. ### Remarks regarding the tables :white_check_mark: `User` done :small_orange_diamond: `Device` -- done, except for the link to the user. ATM the user is linked to each sample, not the device. If we want to have it at the device, it should be considered volatile and go into `DeviceAttributes`, I think. :question: I don't quite get the `Activity_dataprovider_type`. Isn't this just an Activity_type? :question: why `steps_minute_average` It's the actual steps per minute, not an average value, no? :question: `miband_live_activity_measurement` I'd like to have a single table for all intensity values, a single table for all HR measurements and a single table for all steps. What's the reason for splitting them? A final remark: I'm afraid that having so many device specific tables will make the implementation rather complex. I'd vote for having as few device specific tables as possible.
Owner

I didn't get the static vs volatile data thing. It makes great sense!

I didn't get the static vs volatile data thing. It makes great sense!
Poster
Owner

Here's a summary of the current state, as well as the reasons for it.

On the Road

First, we did consider to use separate tables for every sample type (i.e. one table for heart rate samples, one for activity samples (intensity), one for steps, one for light intensity, etc.). But since we want to store raw values, that is, device specific values that are normalized at runtime, we would have even more tables. E.g. activity intensity values are device specific, similarly activity type values. Only few sample types like heart rate (= bpm) and steps (total per day, or maybe a delta) would be device independent.

So instead of having a few device independent tables plus a few device specific tables and all the complexity of mixing and matching these at runtime, we went for having a single table per device, or more correct, per sample provider (there are different software implementations that provide samples for the Pebble, for example).

Now that we settled on one specific table per device type, we still wanted to have separate sample interfaces (e.g. HeartRateSample, ActivitySample, LightIntensitySample, etc.

When implementing the client side for that, we noticed two drawbacks of having such separate interfaces:

  1. you need separate queries per sample type, and separately iterate over the results and do something with them.
  2. if you aggregate data of different sources, you could either only aggregate e.g. heart rate samples from one device with heart rate samples from another device. Or, you would mix heart rate samples with activity sample. Then you would have to perform an instanceof check for every sample, and another validity check for its value.

Having separate interfaces also made the backend implementation a tad complicated, so these three things made us go another route, which is what we have right now in master.

Current State

  1. we have separate tables per device type, e.g. one for Mi Band, one for Pebble Health, one for Pebble Misfit, one for Pebble Morpheuz, etc.
  2. we have one single interface for all activity samples, currently named ActivitySample, but about to be renamed to Sample. This interface contains the parts about heart rate, activity, light intensity, steps, etc.
  3. all samples implement this interface, even if the device is just a simple step counter
  4. the abstract base implementation for all specific sample classes returns a NOT_MEASURED value by default

Benefits

  1. simple(r) implementation in the backend including the SampleProvider implementations
  2. simple client interface, only has one interface to all samples, no instanceof necessary. If the client queries for heart rate samples only, it can easily ignore all the methods about steps and light in the sample interface.
  3. missing sample values can be reognized easily by comparing with ActivitySample.NOT_MEASURED.
  4. adding support for a new sample type is as easy as adding the appropriate methods to the Sample interface, as well as default implementations to the AbstractSample class. Then a device specific SampleProvider and Sample implementation can actually handle the new sample type and a client has to make use of it.

So the main benefit and reason we did it that way is reduced complexity on all levels.

Here's a summary of the current state, as well as the reasons for it. ## On the Road First, we did consider to use separate tables for every sample type (i.e. one table for heart rate samples, one for activity samples (intensity), one for steps, one for light intensity, etc.). But since we want to store raw values, that is, device specific values that are normalized at runtime, we would have even more tables. E.g. activity intensity values are device specific, similarly activity type values. Only few sample types like heart rate (= bpm) and steps (total per day, or maybe a delta) would be device independent. So instead of having a few device independent tables _plus_ a few device specific tables and all the complexity of mixing and matching these at runtime, we went for having a single table per device, or more correct, per sample provider (there are different software implementations that provide samples for the Pebble, for example). Now that we settled on one specific table per device type, we still wanted to have separate sample interfaces (e.g. `HeartRateSample`, `ActivitySample`, `LightIntensitySample`, etc. When implementing the client side for that, we noticed two drawbacks of having such separate interfaces: 1. you need separate queries per sample type, and separately iterate over the results and do something with them. 2. if you aggregate data of different sources, you could either only aggregate e.g. heart rate samples from one device with heart rate samples from another device. Or, you would mix heart rate samples with activity sample. Then you would have to perform an `instanceof` check for every sample, and another validity check for its value. Having separate interfaces also made the backend implementation a tad complicated, so these three things made us go another route, which is what we have right now in master. ## Current State 1. we have separate tables per device type, e.g. one for Mi Band, one for Pebble Health, one for Pebble Misfit, one for Pebble Morpheuz, etc. 2. we have _one_ single interface for all activity samples, currently named `ActivitySample`, but about to be renamed to `Sample`. This interface contains the parts about heart rate, activity, light intensity, steps, etc. 3. _all_ samples implement this interface, even if the device is just a simple step counter 4. the abstract base implementation for all specific sample classes returns a `NOT_MEASURED` value by default ## Benefits 1. simple(r) implementation in the backend including the `SampleProvider` implementations 2. simple client interface, only has one interface to all samples, no `instanceof` necessary. If the client queries for heart rate samples only, it can easily ignore all the methods about steps and light in the sample interface. 3. missing sample values can be reognized easily by comparing with `ActivitySample.NOT_MEASURED.` 4. adding support for a new sample type is as easy as adding the appropriate methods to the `Sample` interface, as well as default implementations to the `AbstractSample` class. Then a device specific `SampleProvider` and `Sample` implementation can actually handle the new sample type and a client has to make use of it. So the main benefit and reason we did it that way is reduced complexity on all levels.
Owner

Closed with 0.12

Closed with 0.12
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Freeyourgadget/Gadgetbridge#206
Loading…
There is no content yet.