#2159 Upload database to an InfluxDB timeseries database

Open
opened 2 weeks ago by tjhowse · 9 comments
tjhowse commented 2 weeks ago

Related to #49

My situation: I just got a fitness band and set it up with Gadgetbridge. I'd love to be able to have GB automatically push data from its database up to my local InfluxDB timeseries database such that I can do pretty graphs and the like.

I'm happy to put in the work to create an interface between GB and InfluxDB. I've skimmed this project and my first guess would be to start with PeriodicExporter and create a PeriodicUploader. We'd need to add settings for the InfluxDB address (hostname, port), credentials, and version. I don't propose to do any interpretation of the data in GB, just dump the timeseries information to the server via InfluxDB's REST api.

Does anyone else think this would be effort well spent?

Cheers,
tjhowse.

Related to https://codeberg.org/Freeyourgadget/Gadgetbridge/issues/49 My situation: I just got a fitness band and set it up with Gadgetbridge. I'd love to be able to have GB automatically push data from its database up to my local InfluxDB timeseries database such that I can do pretty graphs and the like. I'm happy to put in the work to create an interface between GB and InfluxDB. I've skimmed this project and my first guess would be to start with PeriodicExporter and create a PeriodicUploader. We'd need to add settings for the InfluxDB address (hostname, port), credentials, and version. I don't propose to do any interpretation of the data in GB, just dump the timeseries information to the server via InfluxDB's REST api. Does anyone else think this would be effort well spent? Cheers, tjhowse.
tjhowse commented 2 weeks ago
Poster

Thanks to @vanous I now understand that GB has historically been a strictly offline-only piece of software. I think there are good privacy-related reasons for this remaining the case.

@vanous proposed a few options in #2160, including having online and offline distributions of GB, and having a second app receive a broadcasted intent from GB and handing off the data for the upload.

I think having two distributions would work, however it could potentially confuse new users and dilute the stats, making GB harder to find and less popular than it deserves to be.

I propose having a second app (GBSync for the purpose of this discussion) that broadcasts an intent that GB receives. Upon receipt, GB dumps the needful data, in InfluxDB line protocol, to a file accessible to other apps. Once the dump is complete GB broadcasts an intent (including filename/path) back to GBSync. Upon receipt of this signal GBSync uploads the data from the file and deletes the file.

How does that sound?

Thanks to @vanous I now understand that GB has historically been a strictly offline-only piece of software. I think there are good privacy-related reasons for this remaining the case. @vanous proposed a few options in #2160, including having online and offline distributions of GB, and having a second app receive a broadcasted intent from GB and handing off the data for the upload. I think having two distributions would work, however it could potentially confuse new users and dilute the stats, making GB harder to find and less popular than it deserves to be. I propose having a second app (GBSync for the purpose of this discussion) that broadcasts an intent that GB receives. Upon receipt, GB dumps the needful data, in InfluxDB line protocol, to a file accessible to other apps. Once the dump is complete GB broadcasts an intent (including filename/path) back to GBSync. Upon receipt of this signal GBSync uploads the data from the file and deletes the file. How does that sound?
vanous commented 2 weeks ago
Poster
Collaborator

hi @tjhowse,

cool. Instead of intents, i think we will need to use Content Providers (i was the one mentioning Intents before, just for the brevity).

Question is: do we have the time and energy to work on GBSync, or do we create a very simple proof of concept, then let people to make/provide their own "sync" receivers which users would be enabling via settings in Gb?

hi @tjhowse, cool. Instead of intents, i think we will need to use Content Providers (i was the one mentioning Intents before, just for the brevity). Question is: do we have the time and energy to work on GBSync, or do we create a very simple proof of concept, then let people to make/provide their own "sync" receivers which users would be enabling via settings in Gb?
tjhowse commented 2 weeks ago
Poster

I think that's more of an archetectual question rather than an implementation one. I'm familiar with the nuts and bolts of sqlite and influxdb, but I'm less familiar with GB and mobile apps in general.

Do we add support in GB to receive an external sync signal (intent/content provider/etc) with a format specified (InfluxDB for starters), then GB performs the export of the specified type and broadcast a signal to indicate the export is complete?

I think that's more of an archetectual question rather than an implementation one. I'm familiar with the nuts and bolts of sqlite and influxdb, but I'm less familiar with GB and mobile apps in general. Do we add support in GB to receive an external sync signal (intent/content provider/etc) with a format specified (InfluxDB for starters), then GB performs the export of the specified type and broadcast a signal to indicate the export is complete?
nagimov commented 1 week ago
Poster

Here's how I automatically push data from gadgetbridge to inflixdb:

Here's how I automatically push data from gadgetbridge to inflixdb: - setup periodical auto-export of the database file: https://codeberg.org/Freeyourgadget/Gadgetbridge/wiki/Data-Export-Import-Merging-Processing#user-content-data-auto-export - sync the backup folder to a linux box via syncthing: https://f-droid.org/en/packages/com.nutomic.syncthingandroid/ - setup a cron job with python script periodically parsing sqlite db file and pushing data to influxdb - here are few useful examples: https://codeberg.org/Freeyourgadget/Gadgetbridge/wiki/Data-Export-Import-Merging-Processing#draw-charts-for-year-month-week-day-in-python
tjhowse commented 1 week ago
Poster

That would definitely work for me, and I suppose anyone savvy enough to run their own influxDB server could probably manage to set up the above system.

The only technical problem I can think of, other than the complexity of setting up and maintaining all the moving parts, is that over time the sync would get slower as the database grows. I suppose the user could periodically wipe the gadgetbridge database to reduce the size?

Ideally I'd prefer a less complex system, but I'm far more comfortable writing python glue on a server than I am building mobile apps.

That would definitely work for me, and I suppose anyone savvy enough to run their own influxDB server could probably manage to set up the above system. The only technical problem I can think of, other than the complexity of setting up and maintaining all the moving parts, is that over time the sync would get slower as the database grows. I suppose the user could periodically wipe the gadgetbridge database to reduce the size? Ideally I'd prefer a less complex system, but I'm far more comfortable writing python glue on a server than I am building mobile apps.
nagimov commented 1 week ago
Poster

Yeah same here. It takes no time to throw 50 lines of python together. Installing android studio is a new kind of pain every time.

Sync times shouldn't be noticeable at all. My ~100 days old sqlite db (~150k rows in the main data table) is ~3.5MB, which gives ~12MB per year. Sqlite is pretty fast, and only newly added data rows need to be queried, so even raspberry pi won't break a sweat.

So something like:

# get the latest timestamp from influxdb
t_last = list(influx_client.query('SELECT last(STEPS), time FROM "gadgetbridge"').get_points())[0]['time']
# query only the rows with newer timestamps from sqlite
data = cursor.execute(f'SELECT STEPS FROM <YOUR_MAIN_TABLE> WHERE TIMESTAMP > {t_last}').fetchall()

Then wrap a few try...excepts around. Or don't - who cares what you do on your own VM :)

Yeah same here. It takes no time to throw 50 lines of python together. Installing android studio is a new kind of pain every time. Sync times shouldn't be noticeable at all. My ~100 days old sqlite db (~150k rows in the main data table) is ~3.5MB, which gives ~12MB per year. Sqlite is pretty fast, and only newly added data rows need to be queried, so even raspberry pi won't break a sweat. So something like: ```python # get the latest timestamp from influxdb t_last = list(influx_client.query('SELECT last(STEPS), time FROM "gadgetbridge"').get_points())[0]['time'] # query only the rows with newer timestamps from sqlite data = cursor.execute(f'SELECT STEPS FROM <YOUR_MAIN_TABLE> WHERE TIMESTAMP > {t_last}').fetchall() ``` Then wrap a few `try...except`s around. Or don't - who cares what you do on your own VM :)
vanous commented 1 week ago
Poster
Collaborator

From what i can see, i have about 500000 records per year, my database (since 2017) is now ~33MB. Processing it on a computer is not an issue, processing it on device is a different story. Fetching a year worth of history from sqlite on device takes ~15-19 seconds, but fetching all three years can be 60 seconds. I have quite exact data, because i have been experimenting with export to CSV which many people have been asking for, (but) it seems that this can (on device) be a process taking several minutes. I have tried few different things, but didn't gain in speed and this is only retrieving, the conversion to CSV still takes some time...

As for manual synchronizing, i do the same thing - syncing with a syncthing, but recently, i have not been very happy with this for the complexity and consequences: For syncthing to be used, you must have the same data on both ends (the expert settings to ignore deleted data is not considered safe and might be removed in the future, plus it can cause problems, see more here). But this means that all the time you have your entire activity history in a folder which is available to any app, so if anyone decided to check if Gb is installed and then pull data from the export folder, they have an easy job. The ignore delete settings in syncthing is useful here and i am planning to add Delete/Clean export data folder to the Data management screen in Gb to make cleaning it easier, but these are too many pain points and moving parts.

This is why having simpler way of doing this is important. Some people prefer mqtt, some http calls, some whatever other way. My tendency here would be to allow 3rd party integrations, but from what i can see, there would either be the Content Providers way, which must be baked into the manifest, meaning it is not simple for anyone to integrate with Gb without Gb adding a app as being supported, another way would be to just broadcast the data, but then again, any app could receive it... or add some kind of solution to encrypt the broadcasted data with user set key, which feels smelly. But we should find a way.

From what i can see, i have about 500000 records per year, my database (since 2017) is now ~33MB. Processing it on a computer is not an issue, processing it on device is a different story. Fetching a year worth of history from sqlite on device takes ~15-19 seconds, but fetching all three years can be 60 seconds. I have quite exact data, because i have been experimenting with export to CSV which many people have been asking for, (but) it seems that this can (on device) be a process taking several minutes. I have tried few different things, but didn't gain in speed and this is only retrieving, the conversion to CSV still takes some time... As for manual synchronizing, i do the same thing - syncing with a syncthing, but recently, i have not been very happy with this for the complexity and consequences: For syncthing to be used, you must have the same data on both ends (the expert settings to ignore deleted data is not considered safe and might be removed in the future, plus it can cause problems, see more [here](https://forum.syncthing.net/t/ignore-delete/15414)). But this means that all the time you have your entire activity history in a folder which is available to any app, so if anyone decided to check if Gb is installed and then pull data from the export folder, they have an easy job. The `ignore delete` settings in syncthing is useful here and i am planning to add `Delete/Clean export data folder` to the `Data management` screen in Gb to make cleaning it easier, but these are too many pain points and moving parts. This is why having simpler way of doing this is important. Some people prefer mqtt, some http calls, some `whatever other way`. My tendency here would be to allow 3rd party integrations, but from what i can see, there would either be the Content Providers way, which must be baked into the manifest, meaning it is not simple for anyone to integrate with Gb without Gb adding `a` app as being supported, another way would be to just broadcast the data, but then again, any app could receive it... or add some kind of solution to encrypt the broadcasted data with user set key, which feels smelly. But we should find a way.
tjhowse commented 1 week ago
Poster

Bear in mind that if we can do some preprossing on the device we woudn't be backfilling an entire database in one transaction - we'd be syncing approximately 5k records at a time, or about 22kB, until the remote DB is caught up with the local one. This will take time but I suspect the bottleneck will be the REST API on the consumer rather than the device. Syncing the whole DB off the phone each time incurs the full network cost every time.

I'm starting to think your original idea of having a network-enabled variant of gadgetbridge would be the cleanest solution. This way additional integrations can be added to GB itself to reduce external dependencies. The syncthing solution works but depends on lots of external, fragile, configuration to work perfectly. I just spent an hour trying to stop syncthing from consuming all of my phone's batteries. If I have to factory reset my phone I don't look forward to having to set it up again.

Bear in mind that if we can do some preprossing on the device we woudn't be backfilling an entire database in one transaction - we'd be syncing approximately 5k records at a time, or about 22kB, until the remote DB is caught up with the local one. This will take time but I suspect the bottleneck will be the REST API on the consumer rather than the device. Syncing the whole DB off the phone each time incurs the full network cost every time. I'm starting to think your original idea of having a network-enabled variant of gadgetbridge would be the cleanest solution. This way additional integrations can be added to GB itself to reduce external dependencies. The syncthing solution works but depends on lots of external, fragile, configuration to work perfectly. I just spent an hour trying to stop syncthing from consuming all of my phone's batteries. If I have to factory reset my phone I don't look forward to having to set it up again.
tjhowse commented 1 week ago
Poster

I have bashed out a quick sync script to dump the GB DB export to an InfluxDB server: https://github.com/tjhowse/gb2influxdb

This will work as a stopgap.

I have bashed out a quick sync script to dump the GB DB export to an InfluxDB server: https://github.com/tjhowse/gb2influxdb This will work as a stopgap.
vanous added the
network companion app
label 1 week ago
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.