Python Pandas Load Excel To Oracle

# Excel to Oracle

**pandas.DataFrame.to_sql¶**

DataFrame.to_sql(*self*, *name*, *con*, *schema=None*, *if_exists='fail'*, *index=True*, *index_label=None*, *chunksize=None*, *dtype=None*, *method=None*)

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

param	mean
name : string	Name of SQL table.
name : stringName of SQL table.con : sqlalchemy.engine.Engine or sqlite3.Connection	Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects.
schema : string, optional	Specify the schema (if database flavor supports this). If None, use default schema.
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’	How to behave if the table already exists.fail: Raise a ValueError. replace: Drop the table before inserting new values. append: Insert new values to the existing table.
index : bool, default True	Write DataFrame index as a column. Uses index_label as the column name in the table.
index_label : string or sequence, default None	Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional	Rows will be written in batches of this size at a time. By default, all rows will be written at once.
dtype : dict, optional	Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode.
method : {None, ‘multi’, callable}, default None	Controls the SQL insertion clause used: None : Uses standard SQL `INSERT` clause (one per row). ‘multi’: Pass multiple values in a single `INSERT` clause. callable with signature `(pd_table, conn, keys, data_iter)` .

from sqlalchemy import create_engine
from sqlalchemy.dialects.oracle import VARCHAR2
import pandas as pd
import time

start = time.time()
df = pd.read_excel('./test.xls') # use openpyxl read excel to DataFrame

dtype = {c: VARCHAR2(df[c].str.len().max())
         for c in df.columns[df.dtypes == 'object'].tolist()} # convert object to oracle VARCHAR2

dev = 'name:password@instance'
conn_string = f'oracle+cx_oracle://{dev}'
engine = create_engine(conn_string, echo=False)
df.to_sql(name='babb_temp', con=engine, index=False, if_exists='append',
          dtype=dtype, chunksize=10**4)
# res = engine.execute("SELECT * FROM babb_temp").fetchall()

print(time.time() - start)

# CSV to Excel

Convert CSV to Excel, and meet the following requirements.

Keep the leading zero.
The date format in Excel is DD/MM/YYYY .

PO Line Number,Date
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025
01,24-Jan-2025

import pandas as pd
from pathlib import Path

path = Path('./ASN.csv')

df = pd.read_csv(path, dtype='str') # Keep the leading zero of 01
df['Date'] = pd.to_datetime(df['Date'], format="%d-%b-%Y") # Convert 24-Jan-2025 to pandas datetime

# Write pandas datetime to excel date using format DD/MM/YYYY
# First of all, you need "pip to install xlsxwriter", otherwise it will be ineffective and there will be no error.
with pd.ExcelWriter(path.stem + '.xlsx', datetime_format='DD/MM/YYYY') as writer: 
   df.to_excel(writer, index=False, sheet_name='ASN')

# BUG: ExcelWriter set_column with num_format datetime doesn’t work (but it works for xlsxwriter)

# Reproducible Example

from datetime import datetime
import pandas as pd

writer = pd.ExcelWriter('datetime_bug.xlsx', engine='xlsxwriter')
workbook = writer.book

# datetime from pandas is not converted with the specified num_format
data = pd.DataFrame([[0.05, datetime(year=2020, month=1, day=1)]], columns=["percentage", "datetime_date"])

data.to_excel(writer, sheet_name="test_sheet", index=False)

percentage_format = workbook.add_format({'num_format':'0.00%'})
date_format = workbook.add_format({'num_format':'dd.mm.yyyy'})

writer.sheets["test_sheet"].set_column('A:A', width=None, cell_format=percentage_format)
writer.sheets["test_sheet"].set_column('B:B', width=None, cell_format=date_format)

workbook.close()

# Issue Description

I created an Excel sheet using pd.to_excel() . The data contains one column with a float number and one column with a datetime.datetime object.

Then, I want to set the column type using the num_format parameter in set_column .

This works for the float value, which I want in the percentage format.
However, it does not work for the datetime value. The value is in the wrong format “01.01.2020 00:00:00” (German date). The value in the Excel file is in the same wrong format, whether I run set_column() for the date column or not.

I have tried to circle in the problem by trying other variations, which turns out that they do work. However, those would be only workarounds for our use case.

Using the parameter datetime_format='dd.mm.yyyy' in the pd.ExcelWriter constructor puts the date into the right format.
Instead of writing the sheet from a DataFrame, using the xlsxwriter functions works fine:

import xlsxwriter
from datetime import datetime

workbook = xlsxwriter.Workbook('datetime_bug.xlsx')
worksheet = workbook.add_worksheet()

# using the xlsxwriter functions works as intended
worksheet.write('A1', 0.05)
worksheet.write('B1', datetime(year=2020, month=1, day=1))

percentage_format = workbook.add_format({'num_format':'0.00%'})
date_format = workbook.add_format({'num_format':'dd.mm.yyyy'})

worksheet.set_column('A:A', width=None, cell_format=percentage_format)
worksheet.set_column('B:B', width=None, cell_format=date_format)

workbook.close()

# Expected Behavior

The value is in the format “01.01.2020” when using set_column with pd.ExcelWriter .

# Why

set_column does not change the format; it changes the default format:

The cell_format parameter will be applied to any cells in the column that don’t have a format.

Since pandas is setting the format for date cells, it has no effect. As far as I can tell there is no easy way to overwrite an existing format with xlsxwriter. The best resolution I see for this within pandas is to use datetime_format as you mentioned.

python