Replace numpy.ndarray with pandas.DataFrame #1

Open
opened 2 years ago by steko · 1 comments
steko commented 2 years ago
Owner

Our current codebase is entirely based on ndarray subclasses. Most functions are littered with typical Numpy idioms like curve[-1,0] or hpd_curve[:,0] with indexes that are both opaque to new contributors and difficult for the current maintainer.

We should explore introducing classes based on DataFrame with column labelling, making the entire codebase more readable.

Our current codebase is entirely based on `ndarray` subclasses. Most functions are littered with typical Numpy idioms like `curve[-1,0]` or `hpd_curve[:,0]` with indexes that are both opaque to new contributors and difficult for the current maintainer. We should explore introducing classes based on `DataFrame` with column labelling, making the entire codebase more readable.
steko added the
enhancement
label 2 years ago
steko self-assigned this 2 years ago
Poster
Owner

Based on several tests done so far, using pandas.DataFrame doesn't necessarily make the code more readable or faster.

First of all many operations like resampling and interpolating assume that the underlying data is a time series (which is conceptually correct but of course the timespan supported by pandas is much shorter than what we need to handle dates 50000 years before present).

Currently our main classes like CalibrationCurve and CalendarAge are subclasses of numpy.ndarray, that execute one or more functions when they are instantiated. This seems more convoluted with pandas.

So, even if I don't have reached a conclusion yet, I think it may be more productive to use index arrays with the current numpy-based implementation.

Based on several tests done so far, using pandas.DataFrame doesn't necessarily make the code more readable or faster. First of all many operations like resampling and interpolating assume that the underlying data is a time series (which is conceptually correct but of course the timespan supported by pandas is much shorter than what we need to handle dates 50000 years before present). Currently our main classes like CalibrationCurve and CalendarAge are subclasses of numpy.ndarray, that execute one or more functions when they are instantiated. This seems more convoluted with pandas. So, even if I don't have reached a conclusion yet, I think it may be more productive to use [index arrays](https://numpy.org/doc/stable/user/basics.indexing.html#index-arrays) with the current numpy-based implementation.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.