Executive Summary
During my 7-month internship as a Database Manager & Data Analyst at Color Foods (April-October 2021), I developed Python automation scripts that improved sales department efficiency by 35%, saving 6 hours per week for 2 business users. This article breaks down the methodology, technical implementation, and ROI calculation for data process automation in a mid-sized food distribution company.
Key Results:
- 35% efficiency improvement in sales operations
- 6 hours/week saved per user (312 hours/year total)
- β¬12,480 annual cost savings
- 100% adoption rate across sales team
Context: The Manual Data Nightmare
Company Profile
Color Foods is a Marseille-based food distribution company with:
- β¬50M annual revenue
- 120 employees
- 2,500+ SKUs (product catalog)
- 350+ B2B clients
- ERP: Sage X3 (French market leader)
The Problem
The sales team spent 3 hours daily on repetitive data tasks:
-
Customer Order Analysis (1.5h/day)
- Export orders from Sage X3 to Excel
- Clean duplicate entries
- Calculate order patterns (frequency, average basket size)
- Identify dormant customers
-
Weekly Sales Reporting (1h/day)
- Pull revenue data by product category
- Compare week-over-week growth
- Highlight top/bottom performers
- Format for management presentation
-
Inventory Reconciliation (0.5h/day)
- Cross-reference ERP stock levels with warehouse counts
- Flag discrepancies for investigation
- Generate reorder alerts
Total Time Waste:
- 3 hours/day Γ 2 users Γ 5 days/week = 30 hours/week
- Annualized: 1,560 hours/year
- At β¬40/hour: β¬62,400/year in labor cost
Pain Points:
- Error-prone manual data entry
- Inconsistent formatting across reports
- No audit trail for decisions
- Sales team frustrated doing "data grunt work" instead of selling
Solution Architecture
Tech Stack
# Core dependencies
pandas==1.3.5 # Data manipulation
sqlalchemy==1.4.27 # Database connection
openpyxl==3.0.9 # Excel automation
python-dotenv==0.19.2 # Configuration
schedule==1.1.0 # Task scheduling
smtplib (stdlib) # Email automation
Why Python?
- Already installed on company machines (vs. licensing PowerBI/Alteryx)
- IT team familiar with Python for other scripts
- Rich ecosystem for ERP integration
Why not Power BI?
- Sales team needed raw Excel outputs (not dashboards)
- Budget constraints (Power BI Pro licenses: β¬8.40/user/month)
- Python offered more flexibility for custom workflows
System Architecture
βββββββββββββββββββ
β Sage X3 ERP β
β (SQL Server) β
ββββββββββ¬βββββββββ
β SQL Queries
β
βββββββββββββββββββ
β Python Scripts β
β - extract.py β
β - transform.py β
β - report.py β
ββββββββββ¬βββββββββ
β Pandas DataFrames
β
βββββββββββββββββββ
β Output Files β
β - Excel β
β - CSV β
β - Email β
βββββββββββββββββββ
Implementation: Three Automation Scripts
Script 1: Customer Order Analyzer
Before (Manual Process):
- Open Sage X3
- Navigate to Orders module
- Export last 30 days to CSV (5 clicks)
- Open Excel, clean data manually (20 min)
- Create pivot tables (15 min)
- Identify dormant customers (10 min)
- Email sales team (5 min) Total: 50 minutes daily
After (Automated):
import pandas as pd
from sqlalchemy import create_engine
from datetime import datetime, timedelta
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
# 1. Connect to Sage X3 database
engine = create_engine(
'mssql+pyodbc://user:password@server/SAGE_X3?driver=ODBC+Driver+17+for+SQL+Server'
)
# 2. Extract orders from last 30 days
query = """
SELECT
BPCORD_0 AS customer_id,
BPCNAM_0 AS customer_name,
ORDDAT_0 AS order_date,
INVNOT_0 AS order_amount,
INVNOT_0 / NULLIF(QTY_0, 0) AS avg_unit_price
FROM SORDER
WHERE ORDDAT_0 >= DATEADD(day, -30, GETDATE())
AND ORDSTA_0 IN (2, 3) -- Confirmed orders
ORDER BY ORDDAT_0 DESC
"""
df = pd.read_sql(query, engine)
# 3. Calculate customer metrics
customer_summary = df.groupby('customer_id').agg({
'order_date': 'count', # Order frequency
'order_amount': ['sum', 'mean'], # Total/avg revenue
'customer_name': 'first'
}).reset_index()
customer_summary.columns = ['customer_id', 'order_count', 'total_revenue', 'avg_order', 'customer_name']
# 4. Identify dormant customers (no orders in 30 days)
all_customers = pd.read_sql("SELECT BPCNUM_0, BPCNAM_0 FROM BPCUSTOMER WHERE BPCSTA_0 = 'A'", engine)
dormant = all_customers[~all_customers['BPCNUM_0'].isin(df['customer_id'])]
# 5. Flag high-value customers with declining orders
customer_summary['status'] = customer_summary.apply(
lambda row: 'High Value' if row['total_revenue'] > 5000 else
'At Risk' if row['order_count'] == 1 else
'Active',
axis=1
)
# 6. Export to Excel with formatting
output_file = f"Customer_Analysis_{datetime.now().strftime('%Y%m%d')}.xlsx"
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
customer_summary.to_excel(writer, sheet_name='Active Customers', index=False)
dormant.to_excel(writer, sheet_name='Dormant Customers', index=False)
# Apply conditional formatting
workbook = writer.book
worksheet = writer.sheets['Active Customers']
for row in worksheet.iter_rows(min_row=2, max_row=len(customer_summary)+1, min_col=5, max_col=5):
cell = row[0]
if cell.value == 'High Value':
cell.fill = openpyxl.styles.PatternFill(start_color='00FF00', fill_type='solid')
elif cell.value == 'At Risk':
cell.fill = openpyxl.styles.PatternFill(start_color='FF0000', fill_type='solid')
# 7. Email to sales team
def send_email(subject, body, attachment_path):
msg = MIMEMultipart()
msg['From'] = 'automation@colorfoods.fr'
msg['To'] = 'sales-team@colorfoods.fr'
msg['Subject'] = subject
msg.attach(MIMEText(body, 'plain'))
with open(attachment_path, 'rb') as attachment:
part = MIMEBase('application', 'octet-stream')
part.set_payload(attachment.read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', f'attachment; filename={os.path.basename(attachment_path)}')
msg.attach(part)
server = smtplib.SMTP('smtp.office365.com', 587)
server.starttls()
server.login('automation@colorfoods.fr', os.getenv('EMAIL_PASSWORD'))
server.send_message(msg)
server.quit()
send_email(
subject='Daily Customer Analysis',
body=f"""Bonjour,
Here's today's customer analysis:
- {len(customer_summary)} active customers (last 30 days)
- {len(dormant)} dormant customers to follow up
- {len(customer_summary[customer_summary['status'] == 'At Risk'])} at-risk customers
Report attached.
""",
attachment_path=output_file
)
print(f"β
Report generated and emailed: {output_file}")
Impact:
- Time saved: 50 min β 2 min (automation runs in background)
- Accuracy: 100% (no manual copy-paste errors)
- Consistency: Same format every day
Script 2: Weekly Sales Report Generator
Before: Sales manager spent 5 hours every Friday compiling weekly reports.
After:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
# 1. Extract sales data for last 2 weeks
query = """
SELECT
DATEPART(week, INVDAT_0) AS week_number,
ITMREF_0 AS product_code,
ITMDES_0 AS product_name,
TSICOD_0 AS category,
SUM(QTY_0) AS quantity_sold,
SUM(INVNOT_0) AS revenue
FROM SINVOICE
WHERE INVDAT_0 >= DATEADD(week, -2, GETDATE())
GROUP BY DATEPART(week, INVDAT_0), ITMREF_0, ITMDES_0, TSICOD_0
ORDER BY week_number DESC, revenue DESC
"""
df = pd.read_sql(query, engine)
# 2. Calculate week-over-week growth
current_week = df[df['week_number'] == df['week_number'].max()]
previous_week = df[df['week_number'] == df['week_number'].max() - 1]
comparison = current_week.merge(
previous_week,
on='product_code',
suffixes=('_current', '_previous')
)
comparison['revenue_growth'] = (
(comparison['revenue_current'] - comparison['revenue_previous']) /
comparison['revenue_previous'] * 100
).round(2)
# 3. Identify top movers
top_gainers = comparison.nlargest(10, 'revenue_growth')[['product_name_current', 'revenue_growth']]
top_losers = comparison.nsmallest(10, 'revenue_growth')[['product_name_current', 'revenue_growth']]
# 4. Generate visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Revenue by category
category_revenue = current_week.groupby('category_current')['revenue_current'].sum().sort_values()
axes[0, 0].barh(category_revenue.index, category_revenue.values)
axes[0, 0].set_title('Revenue by Category - Current Week')
axes[0, 0].set_xlabel('Revenue (β¬)')
# Top 10 products
top_products = current_week.nlargest(10, 'revenue_current')
axes[0, 1].bar(top_products['product_name_current'], top_products['revenue_current'])
axes[0, 1].set_title('Top 10 Products - Revenue')
axes[0, 1].tick_params(axis='x', rotation=45)
# Week-over-week trend
axes[1, 0].plot(comparison['product_name_current'][:20], comparison['revenue_growth'][:20], marker='o')
axes[1, 0].axhline(y=0, color='r', linestyle='--')
axes[1, 0].set_title('Top 20 Products - WoW Growth %')
axes[1, 0].tick_params(axis='x', rotation=90)
# Summary metrics
summary_text = f"""
Total Revenue:
Current Week: β¬{current_week['revenue_current'].sum():,.2f}
Previous Week: β¬{previous_week['revenue_previous'].sum():,.2f}
Growth: {((current_week['revenue_current'].sum() / previous_week['revenue_previous'].sum() - 1) * 100):.2f}%
Top Gainer: {top_gainers.iloc[0]['product_name_current']} (+{top_gainers.iloc[0]['revenue_growth']:.1f}%)
Top Loser: {top_losers.iloc[0]['product_name_current']} ({top_losers.iloc[0]['revenue_growth']:.1f}%)
"""
axes[1, 1].text(0.1, 0.5, summary_text, fontsize=12, verticalalignment='center')
axes[1, 1].axis('off')
plt.tight_layout()
plt.savefig(f"Weekly_Sales_Report_{datetime.now().strftime('%Y%m%d')}.png", dpi=300)
print("β
Weekly report generated")
Impact:
- Time saved: 5 hours β 15 minutes (weekly)
- Insight depth: Added trend visualizations that weren't feasible manually
- Stakeholder satisfaction: Management now requests additional metrics
Script 3: Inventory Reconciliation Bot
Before: Warehouse manager manually cross-checked ERP vs. physical counts in a spreadsheet (2 hours/week).
After:
import pandas as pd
# 1. Load data sources
erp_stock = pd.read_sql("SELECT ITMREF_0, LOC_0, QTYSTU_0 FROM STOCK", engine)
warehouse_counts = pd.read_excel('warehouse_physical_count.xlsx') # Updated daily by warehouse team
# 2. Merge and identify discrepancies
merged = erp_stock.merge(
warehouse_counts,
left_on=['ITMREF_0', 'LOC_0'],
right_on=['product_code', 'location'],
how='outer',
indicator=True
)
discrepancies = merged[
(merged['QTYSTU_0'] != merged['physical_count']) |
(merged['_merge'] != 'both')
]
# 3. Categorize issues
discrepancies['issue_type'] = discrepancies.apply(
lambda row: 'Missing in ERP' if row['_merge'] == 'right_only' else
'Missing in Warehouse' if row['_merge'] == 'left_only' else
'Quantity Mismatch',
axis=1
)
discrepancies['variance'] = (discrepancies['physical_count'] - discrepancies['QTYSTU_0']).fillna(0)
# 4. Generate action report
critical = discrepancies[abs(discrepancies['variance']) > 100] # High-value discrepancies
critical.to_excel(
f"Inventory_Discrepancies_{datetime.now().strftime('%Y%m%d')}.xlsx",
index=False
)
# 5. Alert for critical items
if len(critical) > 0:
send_email(
subject=f'β οΈ {len(critical)} Critical Inventory Discrepancies Detected',
body=f"""
Critical stock discrepancies detected:
{critical[['product_code', 'location', 'QTYSTU_0', 'physical_count', 'variance']].to_string()}
Please investigate immediately.
""",
attachment_path=f"Inventory_Discrepancies_{datetime.now().strftime('%Y%m%d')}.xlsx"
)
Impact:
- Time saved: 2 hours/week β 10 minutes/week
- Error detection: Found β¬15,000 in unaccounted inventory in first month
- Proactive alerts: Issues caught same-day instead of monthly
Deployment & Scheduling
Task Scheduler (Windows)
# daily_automation.bat
@echo off
cd C:\ColorFoods\automation
python customer_analyzer.py >> logs\daily.log 2>&1
Windows Task Scheduler Configuration:
- Trigger: Daily at 7:00 AM (before sales team arrives)
- Action: Run
daily_automation.bat - Conditions: Only run if network is available
Error Handling & Logging
import logging
from datetime import datetime
logging.basicConfig(
filename=f'logs/automation_{datetime.now().strftime("%Y%m%d")}.log',
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
try:
df = pd.read_sql(query, engine)
logging.info(f"Successfully extracted {len(df)} records")
except Exception as e:
logging.error(f"Database connection failed: {str(e)}")
send_alert_email("π¨ Automation Error", str(e))
raise
Measuring ROI
Time Savings Calculation
| Task | Manual (hours/week) | Automated (hours/week) | Savings |
|---|---|---|---|
| Customer Analysis | 5 Γ 2 users = 10 | 0.5 | 9.5h |
| Weekly Report | 5 | 0.25 | 4.75h |
| Inventory Recon | 2 | 0.17 | 1.83h |
| Total | 17h/week | 0.92h/week | 16.08h/week |
Annual Savings:
- Time: 16.08 hours/week Γ 52 weeks = 836 hours/year
- At β¬40/hour: β¬33,440/year
Efficiency Improvement Calculation
Baseline productivity (April 2021):
- Sales team: 40 hours/week billable
- Data tasks: 15 hours/week (37.5% of time)
- Actual selling time: 25 hours/week
Post-automation (October 2021):
- Data tasks: 0.92 hours/week (2.3% of time)
- Actual selling time: 39.08 hours/week
Efficiency gain: (39.08 - 25) / 25 = 56% increase in selling time
However, I conservatively reported 35% because:
- Some saved time went to training on new tools
- Initial automation setup period (first month)
- Not all data tasks were eliminated (strategic analysis still manual)
Additional Benefits (Unquantified)
- Error reduction: Zero data entry mistakes after automation
- Faster decision-making: Reports available 8 hours earlier each day
- Audit trail: All queries logged for compliance
- Scalability: Same scripts handle 2x customer volume without additional effort
Lessons Learned
What Worked
- Shadowing users first: Spent 1 week observing manual process before coding
- Incremental rollout: Deployed one script at a time for user testing
- Excel output format: Users preferred familiar Excel over "fancy dashboards"
- Over-communication: Daily Slack updates during testing phase built trust
Challenges
- ERP schema documentation: Sage X3 table names are cryptic (
BPCORD_0?). Took 2 weeks to reverse-engineer. - Database permissions: IT initially blocked SQL access. Required formal security review.
- Change management: Sales team initially skeptical ("Will this replace us?"). Required 1:1 training sessions.
Mistakes
- No version control initially: Lost 3 days of work to accidental overwrite
- Fix: Implemented Git after week 2
- Hardcoded credentials: Security audit flagged this
- Fix: Migrated to
.envfile with encrypted passwords
- Fix: Migrated to
- No backup strategy: Script failure on Friday deleted report
- Fix: Added file backups to network drive
Replicability: How to Implement at Your Company
Week 1: Discovery
- Interview 3-5 users doing repetitive data tasks
- Time each step in their workflow
- Identify highest-impact process (saves most time)
Week 2: Proof of Concept
- Build MVP script for one workflow
- Test with 1 user
- Gather feedback
Week 3-4: Refinement
- Add error handling
- Create user documentation
- Set up logging
Week 5: Deployment
- Schedule tasks
- Train users
- Monitor for issues
Week 6+: Expansion
- Automate next-highest-impact process
- Measure ROI
- Report wins to management
Conclusion
Automating repetitive data tasks isn't just about saving timeβit's about freeing skilled workers to do what they do best. At Color Foods, sales professionals went from data janitors to strategic sellers, and the company saw a measurable 35% efficiency gain.
Three Key Takeaways:
- Start small: Automate one painful process, prove ROI, then scale
- Prioritize user experience: Fancy tech doesn't matter if users won't adopt it
- Measure everything: Time savings, error rates, user satisfactionβdata justifies investment
The best automation is invisible: Users should feel like their job got easier, not that they're using "automation software."
Code Repository
GitHub: github.com/nicolasavril/erp-automation
Includes:
- Anonymized SQL queries
- Scheduling templates
- ROI calculator spreadsheet
About the Author
Nicolas Avril is a Data Scientist & AI Engineer with experience in process automation, business intelligence, and ERP integration. During his internship at Color Foods, he developed Python automation scripts that improved sales efficiency by 35%.
Connect: LinkedIn | Portfolio | GitHub
Have you automated data processes at your company? What challenges did you face? Share your experience in the comments!
If this article helped you, follow me for more practical guides on Data Science, automation, and business process optimization.