I work with you to understand your needs and carry out an initial feasibility analysis to establish what might be possible, how this could be achieved and agree realistic objectives
Relevant data is collected from all available internal and external data sources and checked for validity and errors. This can be from databases, Excel spreadsheets, csv files. websites etc. and can be numerical or image data
Data is converted to common formats, duplicates removed, errors corrected, missing data handled (replaced, removed, imputed)
Using the cleaned data, EDA aims to summarize the main characteristics, spot patterns, trends and correlations, identify anomalies, and test hypotheses. Data from different data sources is appropriately combined, as necessary
Both traditional and machine learning models can be developed that use known results and historical data to create forecasts for new or future events and behaviours, and to identify the most important features that determine their outcomes.
Having a much better understanding of what determines outcomes has the potential to
uncover actionable insights hidden in your data.
These insights can be used to guide decision making and planning, leading to improved performance and lower risk
Python is the principal programming language I use for analysis and modelling. It has an extensive ecosystem of libraries, especially with respect to data science, such as NumPy, Pandas, SciPy, Scikit-learn and TensorFlow, and for creating publication quality graphics. It is extremely versatile, runs on most operating systems and integrates well with other software. Python scripts can be developed to support many of your data analysis needs.