Hi, I want to know what is the best way to keep the databases I use in different projects? I use a lot of CSVs that I need to prepare every time I’m working with them (I just copy paste the code from other projects) but would like to make some module that I can import and it have all the processes of the databases for example for this database I usually do columns = [(configuration of, my columns)], names = [names], dates = [list of columns dates], dtypes ={column: type},

then database_1 = pd.read_fwf(**kwargs), database_2 = pd.read_fwf(**kwargs), database_3 = pd.read_fwf(**kwargs)…

Then database = pd.concat([database_1…])

But I would like to have a module that I could import and have all my databases and configuration of ETL in it so I could just do something like ‘database = my_module.dabase’ to import the database, without all that process everytime.

Thanks for any help.

  • gedhrel@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    5 months ago

    If things are changing a bit each month, then in your module rather than a plain variable assignment

    darabase = ...
    

    you might want a function that you can pass in parameters to represent the things that can change:

    def database(dir, ...):
        ...
        return ...
    

    Then you can call it like this:

    from database import database
    db = database("/some/path")
    

    … gope that makes some sense.