Scoping a Data Science Project

One of the hardest aspects of data science can be defining the project. Data is messy and outcomes often change as the projects progress.

There isn’t any perfect way to define a data science project. These are lessons that we’ve learned as The Data Mine has grown and that other mentors have shared with us from industry.

If you have anything that should be added, please email us at datamine-help@purdue.edu.

Data Science Project Checklist

  • What level of experience is needed for the project?

    • Many of our students are early in their academic careers. If you need very specific skills, a sponsored research project may be the way to go.

    • Sponsored research projects are run with teams that are selected for their skills and have more specific goals and expecations for the project.

    • In comparison, the traditional Data Mine project is designed to challenge the students and help them learn, but is primarily about their building experience with analytics and familiarity with your company and ways of working.

  • What kind of technology is required?

    • The Data Mine team is happy to work with you to accommodate different environments at Purdue. We primarily work in Unix environments for our high performance computing (HPC) environments. However, we recognize that technology is varied and ever changing and try to support everything that we can.

    • If you have any questions about The Data Mine’s technical environment please check out Getting Started with Data Science.

  • What data will the students use?

    • Data is both the fuel and the biggest challenge to a project. Small amounts of data or incomplete data sets can lead to challenges that the students aren’t able to address.

  • How will the students learn about the data and models?

    • Often the students will join a project eager to learn, but early in their analytics career.

    • Providing examples of models, data dictionaries, and steps to learn different concepts can be very helpful to the students.

  • Is the project more focused on student experience or developing new products?

    • There is nothing wrong with projects focused on progressing a cutting-edge application or methodology. However, these often fit better as sponsored research.

    • Students love engaging with companies and all projects are beneficial, but it’s important to consider what the most beneficial outcome is for your company.

If you need any help answering these questions or brainstorming a project please reach out to the Data Science team at datamine-help@purdue.edu.

We’d be happy to meet with you and discuss our options!