DataEd 2023

DataEd 2023 Program

We're excited to share our full program with you below! Find us in room Regency G.

The proceedings are now available in the ACM Digital Library!

Time	Program	Presenter and Title	Materials
8:30 AM	Welcome
8:45 AM	Opening Keynote and discussion	Sourav Bhowmick - Human Learners of Relational Query Processing: Who Cares?	paper, slides
10:00 AM	Paper session 1	Daniel Kocher - Feedforward-Aided Course Designs for Similarity Search	paper
		Abdussalam Alawini - Student’s Learning Challenges with Relational, Document, and Graph Query Languages	paper
10:40 AM	Break (SIGMOD break at 10:30)
11:15 AM	Paper session on Tools	Sihem Amer-Yahia - Adaptive Test Recommendation for Mastery Learning	paper, slides
		Ruben Mayer - pTA: An Automated Teaching Assistant for Lab Courses	paper, slides
		Sophia Yang - Mining SQL Problem Solving Patterns using Advanced Sequence Processing Algorithms	paper
12:15 PM	Sponsor talk	Samuel Watson - Relational AI
12:30 PM	Lunch break
1:30 PM	Keynote 2 and discussion	Toni Taipalus - SQL: A Trojan Horse Hiding a Decathlon of Complexities	paper, slides
2:45 PM	Break
3:30 PM	Paper session on Course Design	Erik Golen - Offering Data Science Education to Non-Computing Majors	paper, slides
		Sean Kross - Teaching Data Science by Visualizing Data Table Transformations: Pandas Tutor for Python, Tidy Data Tutor for R, and SQL Tutor	paper, slides
		Michael J. Mior - Relational Playground: Teaching the Duality of Relational Algebra and SQL	paper, slides
4:30 PM	Discussion on where to go from here
5:00 PM	Closing

Keynotes

We're very happy to announce to you our two keynote speakers: Sourav Bhowmick and Toni Taipalus. Read more about their talks below!

Sourav Bhowmick

Human Learners of Relational Query Processing: Who Cares?

There is an increasing attention on lifelong learning of data-related topics primarily due to the data-driven world and rapidly changing technological landscape. This has increased the importance of database-related courses in recent times. One of the key learning goals of adult learners taking a database systems course is to understand how SQL queries are processed in an RDBMS in practice. Most database courses supplement traditional modes of teaching with off-the-shelf RDBMS to provide hands-on opportunities to learn database concepts used in practice. Notably, the data management community has traditionally directed their attention primarily to the needs of enterprise users rather than of learners. Consequently, these RDBMS are not designed for effective and efficient pedagogical support. In this keynote, we present a vision that calls for action to direct some of our attention to learners in order to build effective and efficient technological support to supplement learning of relational query processing. We focus on research challenges in this novel space that are motivated by challenges faced by real-world learners and discuss state-of-the-art tools and techniques that are deployed in practice to mitigate some of them. We identify opportunities for the data management community to make database education data-driven as well as opportunities to create technologies that are inclusive, i.e., facilitating special-needs adults (e.g., learners with autism spectrum disorder (ASD)) to learn database systems. In summary, realizing our vision has tremendous potential of real-world impact on something that is very personal to us – education.

Bio: Sourav S. Bhowmick is an Associate Professor in the School of Computer Science and Engineering (SCSE), Nanyang Technological University, Singapore. His core research expertise is in data management, human-data interaction, and data analytics. His research has appeared in premium venues such as ACM SIGMOD, VLDB, and VLDB Journal. He is co-recipient of Best Paper Awards in ACM CIKM 2004, ACM BCB 2011, and VLDB 2021. He is also co-recipient of the 2021 ACM SIGMOD Research Highlights Award. Sourav is serving as a member of the SIGMOD Executive Committee, a regular member of the PVLDB advisory board, and a co-lead in the committee for Diversity and Inclusion in Database Conference Venues. He is a co-recipient of several service awards including VLDB Service Award in 2018, Distinguished AE Award in SIGMOD 2021 and VLDB 2022, and Distinguished Reviewer Award in 2020. He is the inventor of CLOSET. Sourav was inducted into Distinguished Members of the ACM in 2020. He is a strong advocate of research that directly or indirectly impacts end users.

Toni Taipalus

SQL: A Trojan Horse Hiding a Decathlon of Complexities

Despite its age, SQL is still a widely sought skill among software developers and data engineers, which makes learning SQL a tempting prospect. Several online courses and tutorials may even inspire learners by stating that SQL is a simple and easy language to learn. This impression might also be strengthened by looking at simple SQL statements that read close to English, in contrast to most programming languages. In this keynote, I will present ten complexities hiding behind SQL's initial appeal, and my experiences and possible solutions in mitigating these complexities in data systems education.

Bio: Toni Taipalus (PhD) is a teacher and a researcher at University of Jyväskylä, Finland. He completed his PhD in information systems, focusing on query language education. Currently, he is bridging the gap between database management systems and human-computer interaction with the goal of facilitating data systems education.

Presentation abstracts

Feedforward-Aided Course Designs for Similarity Search - Thomas Hütter and Daniel Kocher

In this paper, we present two feedforward-aided designs for a Master’s level course on similarity search based on different teaching methods: In project-based learning, the students are encouraged to learn autonomously while working on non-trivial real-world problems. Students address a problem over several months by creating an artifact (e.g., by implementing an algorithm). A similar but different teaching method is task-based learning. Rather than working on long-lasting projects, students work on smaller (but useful) tasks. In both course designs, we employ an auto-grader to provide students with automated and instant feedforward in a continuous manner, which allows them to improve their performance autonomously. We discuss and share our experiences with applying both methods in class. Furthermore, we give insights on the course evaluation based on the student’s feedback, share our lessons learned, and analyze the student’s grades.

Feedforward-Aided Course Designs for Similarity Search - Thomas Hütter and Daniel Kocher

Adaptive Test Recommendation for Mastery Learning - Nassim Bouarour, Idir Benouaret, Cédric d'Ham and Simer Amer-Yahia

We tackle the problem of recommending tests to learners to achieve upskilling. Our work is grounded in two learning theories: mastery learning, an instructional strategy that guides learners by providing them tests of increasing difficulty, reviewing their test results, and iterating until they reach a level of mastery; Flow Theory, which identifies different test zones, frustration, learnable, flow and boredom zones, to determine the best 𝑘 tests to recommend to a learner. We formalize the AdUp Problem and develop a multi-objective optimization solution that adapts the difficulty of recommended tests to the learner’s predicted performance, aptitude, and skill gap. We leverage existing models to simulate learner behavior and run experiments to demonstrate that our formalization is best to attain skill mastery. We discuss open research directions including the applicability of reinforcement learning and the recommendation of peers in collaborative projects.

pTA: An Automated Teaching Assistant for Lab Courses - Jawad Tahir, Raj Mandal, Olha Stefanova, Hans-Arno Jacobsen, Christoph Doblander and Ruben Mayer

Lab courses play a crucial role in enabling students to gain a deeper understanding of theoretical concepts, but these courses require a significant effort from the course’s organizational staff, such as instructors and teaching assistants. To address this challenge, we developed pTA, an acronym for programmable teaching assistant, which automates the functional evaluation of students’ submissions for the Cloud Databases course taught at the Technical University of Munich (TUM). pTA reduces the staff workload and provides instant feedback to students, thereby enhancing their understanding of the project specifications. Additionally, pTA includes a live leaderboard that provides a gamification element that makes the course more interactive and engaging for students. It is deployed on a Kubernetes cluster that ensures scalability with evaluation requests. In this paper, we describe the course’s learning milestones and provide an overview of pTA’s architecture and features. The system’s efficacy was evaluated at TUM and the University of Toronto, where it was deployed in two similar courses. Our findings show that pTA reduced the staff workload by at least 75%, lowered the operating cost, and increased course capacity in terms of the number of students. Furthermore, our study suggests that students exhibit more interest in courses that integrate interactive learning systems and gamification elements.

Student’s Learning Challenges with Relational, Document, and Graph Query Languages - Ridha Alkhabaz, Zepei Li, Sophia Yand and Abdussalam Alawini

As the need for database management skills continues to grow, there is an increasing demand for education on database models and their corresponding query languages. However, the body of research addressing the difficulties encountered by novice learners when working with query languages in database systems is still limited. In this study, we examined over 357215 submissions from 462 students’ homework problems during the Fall 2022 semester covering concepts in SQL, MongoDB, and Neo4j query languages. Our analysis through breaking down the most common syntax errors by concept confirms previous research and demonstrates that certain data operations pose challenges to students across different database systems. Specifically, we found that aggregation operations and Join operations were particularly difficult for students, which aligns with prior SQL education research. Therefore, we suggest that instructors consider incorporating visuals and assignments that enable students to build mental models for different database models.

Mining SQL Problem Solving Patterns using Advanced Sequence Processing Algorithms - Sophia Yang, Geoffrey L. Herman and Abdussalam Alawini

SQL is a crucial language for managing relational database systems, and is an essential skill for individuals in roles such as researchers, developers, and business professionals who work with databases. However, learning SQL can be a challenge, presenting an opportunity to study the various methods students use to arrive at semantically equivalent SQL queries. In this study, we examined students' SQL submissions to homework assignments in the Database Systems course offered to upper-level undergraduate and graduate students at the University of Illinois Urbana-Champaign during the Fall 2022 semester. Our goal was to understand how students arrive at SQL solutions and overcome challenges in the learning process by building on prior research on line chart visualizations that instructors can use to increase visibility on students who are struggling. However, a major limitation of this approach was the difficulty for instructors to sift through a large number of visuals representing each student's performance on a SQL problem and generate action items at scale, especially when dealing with enrollments of over 700 students. To overcome this limitation, we developed a novel technique to generate textual representations of the student submission sequence using global sequence alignment scores and regular expression algorithms to further compact these submission sequences. This allows instructors to gain insights quickly, on an aggregate level, and in an automated manner, enabling them to identify students who may be struggling with SQL based on their submission sequence characteristics and take appropriate action to improve database education. Our study discovered common textual submission patterns and pattern elements, and we present our recommendations to instructors to improve database education based on these findings.

Offering Data Science Education to Non-Computing Majors - Xumin Liu, Erik Golen, Rajendra K. Ray and Kimberly Fluet

Data science courses offered by computing departments tend to be inappropriate for non-computing majors due to the emphasis on coding and a long chain of prerequisite courses in computer science and mathematics or statistics. Moreover, courses designed for computing majors by computing faculty do not always match the backgrounds and interests of students majoring in other disciplines. This paper discusses the motivation and challenges of offering an entry-level data science course for students in non-computing disciplines with limited coding experience. Experiences with the teaching of this course at the Rochester Institute of Technology are discussed. Preliminary assessment results have shown this approach to be useful.

Teaching Data Science by Visualizing Data Table Transformations: Pandas Tutor for Python, Tidy Data Tutor for R, and SQL Tutor - Sam Lau, Sean Kross, Eugene Wu and Philip J. Guo

Data science instructors often find it hard to explain to students how a piece of code written in Python, R, or SQL executes in order to transform tabular data. They currently resort to hand-drawing diagrams or making presentation slides to illustrate the semantics of operations such as filtering, sorting, reshaping, pivoting, grouping, and joining. These diagrams are time-consuming to create and do not synchronize with real code or data that students are learning about. In this paper we show that a step-by-step visual representation of tabular data transforms can help instructors to explain these operations. To do so, we created a table visualization library that illustrates the row-, column-, and cell-wise relationships between an operation's input and output tables. On top of this library we built a trio of free web-based visualization tools - Pandas Tutor for Python, Tidy Data Tutor for R tidyverse, and SQL Tutor - that run users' code and automatically produce diagrams of how Python/R/SQL transforms data tables step-by-step from input to output. Since launching in Dec 2021, over 61,000 people from over 160 countries have visited our website to try out these tools.

Relational Playground: Teaching the Duality of Relational Algebra and SQL - Michael J. Mior

Students in introductory data management courses are often taught how to write queries in SQL. This is a useful and practical skill, but it gives limited insight into how queries are processed by relational database engines. In contrast, relational algebra is a commonly used internal representation of queries by database engines, but can be challenging for students to grasp. We developed a tool we call Relational Playground for database students to explore the connection between relational algebra and SQL.