Music is one of the world’s most important cultural forms and one of the most dynamic. Such a dynamic nature can directly influence artists’ careers and reflect their success. In this work, we analyze musical success from a genre-oriented perspective. Specifically, we model both artist and genre success timelines to detect and predict continuous periods with higher impact, i.e., hot streaks. As artist collaboration becomes one of the main strategies to promote new songs, we build and characterize success-based genre collaboration networks for nine markets worldwide. From such networks, we detect collaboration profiles directly related to musical success. Furthermore, we mine exceptional genre patterns in the networks where the success deviates from the average. Our findings show that studying genre collaboration is a powerful way to assess musical success by describing similar behaviors within collaborative songs from multiple perspectives. In addition, considering both global and regional markets is fundamental, as each country has its success dynamics and genre preferences. Such a regional approach also reveals local patterns that shape the global environment. Overall, our work contributes to both the academy and the music industry, as we shed light on the underlying factors of the science behind musical success.
conference & journal articles
2024
SBBD
Evaluating Domain-adapted Language Models for Governmental Text Classification Tasks in Portugue
Domain-adaptive pre-training (DAPT) is a technique in natural language processing (NLP) that tailors pre-trained language models to specific domains, enhancing their performance in real-world applications. In this paper, we evaluate the effectiveness of DAPT in governmental text classification tasks, exploring how different factors, such as target domain dataset, pre-trained model language composition, and dataset size, impact model performance. We systematically vary these factors, creating distinct domain-adapted models derived from BERTimbau and LaBSE. Our experimental results reveal that selecting appropriate target domain datasets and pre-training strategies can notably enhance the performance of language models in governmental tasks.
DSW
ICPSet: A Structured Dataset of Public Procurement Items
Transparency and efficiency in public procurement management are essential to ensure the proper use of public resources. However, the complexity and diversity of procured items pose a significant challenge for analyzing and monitoring these purchases. This paper presents the ICPSet, a structured dataset designed to facilitate the analysis of public procurement data. Containing over 30 million standardized and structured items, the ICPSet provides a robust basis for various analyses and tool development.
DS4SG @ SBBD
Data Insights on Gender Representation: Analyzing the Book and Music Industries
The entertainment industry has been historically dominated by men, which motivates growing recognition and advocacy for improved gender diversity and equality. We present a study on gender representation in the book and music industries by analyzing awarded authors and hit song artists. Through Data Science, we uncover patterns and trends that beg for a more balanced and diverse portrayal of gender in creative expressions and offer insights to foster inclusivity, diversity, and equitable opportunities in such a domain.
DS4SG @ SBBD
A Multidisciplinary Approach to Detecting Irregularities in Public Procurement
In Brazil, open government data has become a fundamental tool for transparency and control in public procurement. In this context, this paper proposes a multidisciplinary approach to detecting irregularities in public procurement through audit trail modeling. Three trails were defined to audit different types of procurement processes: bid dispensations, invitation letters, and exemptions. Overall, our results demonstrate that the proposed methodology proved effective in identifying and prioritizing irregularity alerts, providing an initial analysis that facilitates the screening of large volumes of data.
DS-CoPS @ SBBD
Data Science and Transparency: Experiences with Public Data from SICOM
L. G. L. Costa, M. T. Dutra, G. P. Oliveira, M. O. Silva, D. C. Soares, L. C. S. Faria, W. Meira Jr., and G. L. Pappa
In Companion Proceedings of the 39th Brazilian Symposium on Databases, 2024
This paper presents the experience report of the Programa Capacidades Analíticas (PCA) on the use of the public database from the Sistema Informatizado de Contas dos Municípios (SICOM) for applying data science to governmental expenditures. Using the SICOM database, PCA developed advanced techniques in artificial intelligence and data science to analyze public procurement, municipal expenditures, and detect irregularities.
DS-CoPS @ SBBD
Quanto Custa: Public Procurement Price Bank of the State of Minas Gerais
L. G. L. Costa, M. T. Dutra, G. P. Oliveira, M. O. Silva, D. C. Soares, L. C. S. Faria, W. Meira Jr., and G. L. Pappa
In Companion Proceedings of the 39th Brazilian Symposium on Databases, 2024
This paper introduces Quanto Custa, a system developed by the Prosecution Service of the State of Minas Gerais in partnership with the Federal University of Minas Gerais to facilitate the querying and analysis of prices for public procurement items in the state. The system integrates data from bids, contracts, and invoices, and uses external sources such as ANVISA, ANP, and CEASA-MG to establish reference prices. Through features including item disambiguation and price variation analysis, Quanto Custa aims to enhance transparency and combat fraud in public procurement.
Vórtex
A quantitative comparison of viral and hit songs in the Brazilian music market
G. P. Oliveira, A. P. Couto da Silva, and M. M. Moro
It is common for songs to go viral on streaming platforms and social media, but not all viral songs become hits. In this context, we aim to discover what differs viral from hit songs beyond their definition. We do so by using a quantitative methodology over charts in the Brazilian market. We compare hit and viral songs regarding their intrinsic and extrinsic characteristics, and our results reveal significant differences between them. Features such as music genres, lyrics topics, and emotions emerge as crucial elements to distinguishing such songs within the Brazilian context. Furthermore, temporal features indicate differences in the diffusion processes between hits and virals. Overall, this study offers insights into music consumption in Brazil, revealing the connection between song features and their success and virality on streaming platforms.
SEMISH
Analyzing the temporal relation between virality and success in the Brazilian music market
G. P. Oliveira, A. P. Couto da Silva, and M. M. Moro
In Proceedings of the 51st Integrated Software and Hardware Seminar, 2024
Content virality on social media platforms is essential to modern digital culture. In music, viral songs often gain widespread attention through catchy melodies, relatable lyrics, and captivating visuals. Indeed, social platforms have reshaped music consumption, with viral trends often leading to mainstream success. This study investigates the relationship between music virality and success in Brazil by analyzing their evolution in streaming platforms over time. Through correlation and Granger Causality analyses, we explore the dynamics between these facets of music popularity. Our results show that virality can be used to forecast future success and vice versa, but this cannot be generalized to all songs. Such findings reinforce the differences between the concepts of virality and success besides their symbiotic relationship driven by social platforms.
WCGE
Trilhas de Auditagem para Detecção de Fraudes Envolvendo Servidores Públicos da Saúde
Identifying and preventing fraud in the public sector, especially in the context of healthcare, are crucial issues to guarantee the integrity of resources and the quality of services provided to the population. This paper proposes an approach based on audit trail modeling to identify and rank fraud alerts involving public health employees. By analyzing suspicious patterns in public employee records, we propose a ranking system that directs audit efforts to cases with the highest probability of fraud. The results obtained using our approach provide essential information that simplifies the subsequent manual investigation step carried out by auditors.
WebSci
What makes a viral song? Unraveling music virality factors
G. P. Oliveira, A. P. Couto da Silva, and M. M. Moro
In Proceedings of the 16th ACM Web Science Conference, 2024
The viral phenomenon is present in several contexts, combining the advantages of streaming platforms and other social networks. Music is no exception. Viral songs are widely shared in a short amount of time, and they may become successful by reaching the top of the charts with millions of streams and digital sales. However, not all songs that go viral become hits, as the sharing process is not enough to be converted into streams. In this work, we analyze viral and hit songs as two different yet interconnected aspects of music popularity. Specifically, we aim to uncover factors that are relevant to distinguishing viral from hit songs. We evaluate three research hypotheses related to musical features, and the results reveal that considering only acoustic and other intrinsic features is not enough; and extrinsic features (e.g., artists’ genre and the time from the release to the first chart entry) are essential in achieving such a goal. Moreover, artist-related and temporal features are among the most relevant indicators to differentiate hit and viral songs. Overall, our findings offer relevant insights for understanding the dynamics of music consumption and its sharing on online platforms, as we reveal important factors that describe what makes a viral song and what differentiates it from a hit.
LREC-COLING
Unsupervised Grouping of Public Procurement Similar Items: Which Text Representation Should I Use?
In public procurement, establishing reference prices is essential to guide competitors in setting product prices. Group-purchased products, which are not standardized by default, are necessary to estimate reference prices. Text clustering techniques can be used to group similar items based on their descriptions, enabling the definition of reference prices for specific products or services. However, selecting an appropriate representation for text is challenging. This paper introduces a framework for text cleaning, extraction, and representation. We test eight distinct sentence representations tailored for public procurement item descriptions. Among these representations, we propose an approach that captures the most important components of item descriptions. Through extensive evaluation of a dataset comprising over 2 million items, our findings show that using sophisticated supervised methods to derive vectors for unsupervised tasks offers little advantages over leveraging unsupervised methods. Our results also highlight that domain-specific contextual knowledge is crucial for representation improvement.
JIDM
LiPSet: A Comprehensive Dataset of Labeled Portuguese Public Bidding Documents
M. O. Silva, G. P. Oliveira, H. Hott, L. D. Gomide, B. M. A. Mendes, C. A. Bacha, L. L. Costa, M. A. Brandão, A. Lacerda, and G. Pappa
Collecting, processing, and organizing governmental public documents pose significant challenges due to their diverse sources and formats, complicating data analysis. In this context, this work introduces LiPSet, a comprehensive dataset of labeled documents from Brazilian public bidding processes in Minas Gerais state. We provide an overview of the data collection process and present a methodology for data labeling that includes a meta-classifier to assist in the manual labeling process. Next, we perform an exploratory data analysis to summarize the key features and contributions of the LiPSet dataset. We also showcase a practical application of LiPSet by employing it as input data for classifying bidding documents. The results of the classification task exhibit promising performance, demonstrating the potential of LiPSet for training neural network models. Finally, we discuss various applications of LiPSet and highlight the primary challenges associated with its utilization.
JIS
Exploring Irregularities in Brazilian Public Bids: An In-depth Analysis on Small Companies
In Brazil, bidding processes constitute the main method through which the Public Administration acquires goods and services, and they aim to select the best proposal between several bidding companies. Analyzing public bids can reveal several negotiating characteristics between companies and the public sector, including alerts of fraudulent activities involving such businesses. This article presents two approaches for detecting irregularities within small companies using data extracted from public bids in the Brazilian state of Minas Gerais. For each approach, we perform exploratory and geospatial analysis to better understand specific characteristics of the companies with irregularity alerts. Furthermore, we execute a network analysis to examine the underlying connections between such companies. Our findings reveal the efficacy of both approaches in indicating small companies that may be involved in fraudulent activities. Our methodology and results represent a significant advance for the public sector as they have the potential to enhance mechanisms for overseeing and preventing fraud within bidding processes.
JIS
Overpricing Analysis in Brazilian Public Bidding Items
Analyzing overpricing in public bidding items is essential for government agencies to detect signs of fraud in acquiring public goods and services. In this context, this paper presents two main contributions: a methodology for processing and standardizing bid item descriptions and a statistical approach for overpricing detection using the interquartile range. We evaluated a comparative analysis of three distinct grouping strategies, each emphasizing different facets of the item description standardization process. Furthermore, to gauge the efficacy of both proposed approaches, we leveraged a ground-truth dataset for a thorough evaluation containing quantitative and qualitative analyses. Overall, our findings suggest that the evaluated strategies are promising for identifying potential irregularities within public bidding processes.
Book Chapter
Premiação das mulheres na literatura e na música: análises de dados da Billboard e do Goodreads
Historically, the entertainment industry has been dominated by men; but in recent years there has been a growing recognition and advocacy for better gender representation and equality in these creative fields. In this context, this chapter presents a comprehensive study of gender representation in the book and music industries, analyzing award-winning writers and successful artists. By employing a data-driven approach, we reveal patterns and trends that exist in gender representation in these two industries. The findings highlight the need for more balanced and representative representation of gender in creative expressions and offer ideas to promote inclusion, diversity and equal opportunities in this field.
DGOV
PLUS: A Semi-automated Pipeline for Fraud Detection in Public Bids
M. A. Brandão, A. P. G. Reis, B. M. A. Mendes, C. A. B. Almeida, G. P. Oliveira, H. Hott, L. D. Gomide, L. L. Costa, M. O. Silva, A. Lacerda, and G. L. Pappa
The diversity of sources and formats of public bidding documents makes collecting, processing, and organizing such documents challenging from the point of view of data analysis. Thus, the development of approaches to deal with such data is relevant since the analysis of them allows to expand of the inclusion of people as they have more access to public decisions and expenditures, increase transparency in the public sector and give citizens a greater sense of responsibility for having different points of view on the government’s performance in meeting its public policy goals. In this context, we propose PLUS, a semi-automated pipeline for fraud detection in public bids. PLUS comprises a heuristic meta-classifier for bidding documents and a data quality module. Both modules present promising results after a proof of concept, reinforcing the relevance of PLUS for automating the bidding process investigation. Then, we present two applications of PLUS on real-world data: the construction of audit trails for fraud detection and a price database for overpricing detection. Such applications evidence a significant reduction of specialists’ work searching for irregularities in public bids.
2023
JNMR
Hit song science: a comprehensive survey and research directions
Hit Song Science (HSS) is an emerging topic that aims to unveil the success dynamics within the music industry. Considering the growth of the area, we provide a comprehensive study with a complete review of the main topics of this interdisciplinary field from a computer science perspective. We also define a generic workflow for HSS, introduce taxonomies for success measures and musical features, and categorize the main current learning algorithms. Overall, this survey may serve as a starting point for future research on HSS, as it emerges as a promising field that benefits both the academy and the music industry.
Scientometrics
Hot streaks in the music industry: identifying and characterizing above-average success periods in artists’ careers
In this work, we reveal fundamental patterns that appear in individual musical careers. Such careers may go through ups and downs depending on the current market moment and release of new songs. In particular, they face hot streak periods in which high-impact bursts occur in sequence. Identifying such periods and even predicting them may help in other practical issues, which include foreseeing success and recommending artists. After modeling artists’ careers as time series, we find a general trend of clustering within the most successful weeks, which justifies the applicability of the concept of hot streaks. Hence, we use a specific methodology for identifying hot streaks, whose evaluation results reveal meaningful patterns for artists of different genres. We also confirm the career peaks of artists appear and disappear progressively over time. Overall, our findings shed light on the science of musical success as we observe the temporal evolution of artists’ careers and their hot streaks.
The music industry has always been complex and competitive. Nowadays, combining different genres has become a common practice to promote new music and reach new audiences. Given the diversity of combinations between all genres, predictive and descriptive analyses are very challenging. Here, our goal is to mine frequent and exceptional patterns in music collaborations that have achieved success in both global and regional markets. We use the Apriori algorithm to mine genre patterns and association rules that reveal how music genres combine with each other in each market. The results show significant differences in the behavior of each market and a strong influence of the regional factor on musical success. In addition, we are able to use such patterns to identify and recommend promising genre combinations for such markets through the association rules.
DSW
MGD+: An Enhanced Music Genre Dataset with Success-based Networks
Streaming platforms like Spotify have revolutionized music consumption, generating big volumes of data on hit songs. Such data serve as input to analyzing the music community and to the field of Music Information Retrieval. In this context, we present MGD+: an enhanced Music Genre Dataset with Success-based Networks. By combining Spotify chart data with acoustic metadata, we capture the evolution of musical careers. We further enhance the dataset with a genre-based collaboration network, represented as a graph, connecting artists through collaborations. MGD+ enables building success-based time series across several music markets, offers a friendly interface, and allows reproducibility; being a valuable tool for music-related tasks.
SBBD
Impacto de Doações Eleitorais no Faturamento de Empresas: Um Estudo nas Eleições Municipais em Minas Gerais
B. M. A. Mendes, C. S. Braz, L. L. Costa, G. P. Oliveira, H. R. Hott, M. O. Silva, and G. L. Pappa
In Proceedings of the 38th Brazilian Symposium on Databases, 2023
In Brazil, the prohibition of corporate donations to political campaigns in 2018 aims to strengthen popular participation in the electoral process and reduce the influence of economic power. In this context, this study aims to identify companies whose revenue increased through donations from their partners to the 2020 municipal elections in the state of Minas Gerais. Through experiments using public and private data, we identified suspicious cases of favoritism, where political campaign donations resulted in increased revenue for the donating companies through bidding processes. Overall, our results provide important insights into political campaign donations in Brazil, highlighting the significance of transparency, integrity, and democracy in the electoral process.
SBBD
Impacto do Pré-processamento e Representação Textual na Classificação de Documentos de Licitações
Classifying public bidding documents is relevant for public and private bodies seeking accurate information about such processes. In this work, we investigate the impact of different preprocessing approaches and textual representation models of word embeddings on the effectiveness of the classification of bidding documents. The results show that the preprocessing does not significantly impact the classification result and that the textual representation is essential for the document classes to be more representative.
CTDBD @ SBBD
Analyses of Musical Success based on Time, Genre and Collaboration
Music holds a significant position in global culture, as it is one of the world’s most important and dynamic cultural forms. With the vast amount of music-related data available on the Web, new opportunities emerge for extracting knowledge and benefiting different music segments. In this work, we perform a data-driven analysis to investigate musical success from a genre-oriented perspective. Specifically, we model both artist and genre success timelines to detect and predict continuous periods with higher impact. We also build success-based genre collaboration networks to detect collaboration profiles directly related to success. Furthermore, we use data mining techniques to uncover exceptional genre patterns in the networks where the success deviates from the average. Our findings show that studying genre collaboration is a powerful way to assess musical success by describing similar behaviors within collaborative songs. Overall, our work contributes to both the academy and the music industry, as we shed light on the underlying factors of the science behind musical success.
BRACIS
Evaluating Contextualized Embeddings for Topic Modeling in Public Bidding Domain
Public procurement plays a crucial role in government operations by acquiring goods and services through competitive bidding processes. However, the increasing volume of procurement data has made manual analysis impractical and time-consuming. Therefore, text clustering and topic modeling techniques have been widely used to uncover hidden patterns in unstructured text data. This paper leverages the power of BERT-based models to overcome the challenges associated with analyzing public procurement data. Specifically, we employ BERTopic, a topic modeling technique based on BERT, to generate clusters that capture the underlying topics in procurement data. Additionally, we evaluate several sentence embedding models for representing procurement documents. By combining BERT-based models and advanced sentence embeddings, we aim to enhance the accuracy and interpretability of topic modeling in public procurement analysis. Our results provide valuable insights into the underlying topics within the data, aiding decision-making processes and improving the efficiency of procurement operations.
iSys
Identification of suspected fraud bids through audit trails
L. L. Costa, C. A. Bacha, G. P. Oliveira, M. O. Silva, M. C. Teixeira, M. A. Brandão, A. M. Lacerda, and G. L. Pappa
Different information technologies help to promote government transparency, made possible by agreements promoting and encouraging open data. Public bids are a specific type of this data, made available by the Brazilian government, and aim to ensure transparency and free competition between bidders. However, auditing for irregularities is a non-trivial task due to the massive volume of data and the reduced number of specialists. Thus, this work proposes a methodology based on concepts of audit trails and social networks to create fraud alerts in bids. We also propose an approach to ranking bids according to these tracks. The results reveal that our proposal helps in the fight against corruption by being able to identify suspicious bids.
JIDM
Assessing Data Quality Inconsistencies in Brazilian Governmental Data
G. P. Oliveira, B. M. A. Mendes, C. A. Bacha, L. L. Costa, L. D. Gomide, M. O. Silva, M. A. Brandão, A. Lacerda, and G. L. Pappa
In recent years, vast volumes of data are constantly being made available on the Web, and they have been increasingly used as decision support in different contexts. However, for these decisions to be more assertive and reliable, it is necessary to ensure data quality. Although there are several definitions for this area, it is a consensus that data quality is always associated with a specific context. This work aims to analyze data quality in a data warehouse with governmental information of the Brazilian state of Minas Gerais. We first present a brief comparison of eight open-source data quality tools and then choose the Great Expectations tool for analyzing such data in two real applications: public bids and public expenditure. Our analyses show that the chosen tool has relevant characteristics to generate good data quality indicators to reveal data quality issues that may directly impact the construction of final applications using such data.
BraSNAM
Exceptional Collaboration Patterns in Music Genre Networks
Music is one of the world’s most important cultural forms, and also one of the most dynamic. Such a dynamic nature can directly influence artists’ careers and reflect their success. In this work, we combine social networks and data mining techniques to analyze musical success from a genre-oriented perspective. Our goal is to mine exceptional collaboration patterns in success-based genre networks where the success deviates from the average. We conduct our analyses for global and eight regional markets, and the results show that each market has specific patterns of genre connections in which success is above average. Hence, our findings serve as a first step in developing strategies to promote future song releases across the world.
BraSNAM
Analyzing Character Networks in Portuguese-language Literary Works
Literary works are complex narratives with multifaceted character relationships. Studying these relationships can reveal important insights into the story’s structure and each character’s contribution to the plot development. This research investigates character networks in Portuguese-language literature using two main analytical approaches: structural network analysis and character importance metrics. Our analyses emphasize the significance of character networks in understanding the narrative structure of literary works and reveal the intricate interplay between characters in Portuguese-language literature. These findings deepen our comprehension of literary works’ fundamental structure and the characters’ pivotal role in shaping the story.
BraSNAM
Ranqueamento de Licitações Públicas a partir de Alertas de Fraude
Fraud detection is a complex task in various scenarios. This work proposes an approach to rank public tenders suspected of fraud. For this, we have created 19 audit trails, which were modeled as a social network, and a strategy to consider the alerts they raised in the ranking. The results reveal that the proposed ranking approach can correctly identify bids suspected of fraud.
WCGE
Análise de Irregularidades em Licitações Públicas com Foco em Empresas de Pequeno Porte
Analyzing public bids can reveal several negotiating characteristics between companies and the public sector. Unfortunately, this analysis can also give evidence of fraud involving companies. This paper describes two approaches that help identify irregularities in small businesses based on data extracted from public tenders in the State of Minas Gerais. The results indicate that both approaches can locate small companies suspected of involvement in fraud. Such approaches are essential as they can improve mechanisms for controlling and preventing fraud in the bidding process.
WCGE
Análise de Sobrepreço em Itens de Licitações Públicas
The analysis of overpricing in public bidding items can help government agencies identify indications of fraud in the acquisition of public goods or services. In this context, this article presents two main contributions: a methodology for treating and standardizing the description of bid items; and a statistical approach for detecting overpricing based on grouping performed with item descriptions. The results indicate that the proposed strategies are promising for identifying possible irregularities in public purchases.
Vórtex
Temporal Success Analyses in Music Collaboration Networks: Brazilian and Global Scenarios
Collaboration is a part of the music industry and has increased over recent decades; but little do we know about its effects on success and evolution. Our goal is to analyze how success has evolved over collaboration networks and compare its global scenario to a local, thriving one: the Brazilian music industry. Specifically, we build collaboration networks from data collected from Spotify’s Global and Brazilian daily charts, analyze them and identify collaboration profiles in such networks. Analyses over their topological characteristics reveal collaboration patterns mapped into four different profiles: Standard, Niche, Ephemeral and Absent, where the two first have a higher level of success. Furthermore, we do deeper by evaluating the temporal evolution of such profiles through case studies: pop and k-pop globally, and pop and forró in Brazil. Overall, our findings emphasize the importance of collaboration profiles in assessing success, and show differences between the global and Brazilian scenarios.
In a streaming-oriented era, predicting which songs will be successful is a significant challenge for the music industry. Indeed, there are many efforts in determining the driving factors that contribute to a song’s success, and one potential solution could be incorporating artistic collaborations, as it allows for a wider audience reach. Therefore, we propose a multi-perspective approach that includes collaboration between artists as a factor for hit song prediction. Specifically, by combining online data from Billboard and Spotify, we tackle the problem as both classification and hit song placement tasks, applying five different model variants. Our results show that relying only on music-related features is not enough, whereas models that also consider collaboration features produce better results.
SQJ
How do developers collaborate? Investigating GitHub heterogeneous networks
G. P. Oliveira, A. F. C. Moura, N. A. Batista, M. A. Brandão, A. Hora, and M. M. Moro
Assessing the collaboration among developers is important to understand different aspects of software lifecycle including code smell intensity, bug fixes, and software quality. This kind of collaboration can be obtained from social networks, which represent interactions between individuals in different contexts. In this paper, we model GitHub developers’ collaborations in a heterogeneous network by considering three aspects: social collaboration, collaboration time in a repository and technical features. Then, we explore the GitHub network from different perspectives: size, relevance, and potential applications. The results show the considered metrics are not correlated, bringing new information about the collaborations. We also show that such information is useful for social developer ranking, an actual task which is often part of different applications, such as team formation, community detection and pair programming. Finally, as software quality is intrinsic to the people who code it, our methodology and analyses represent initial steps towards people-centered software quality analysis, as further discussed throughout this article.
2022
CTIC @ WebMedia
Mood Analysis during the COVID-19 Pandemic in Brazil through Music
In this paper, we investigate the oscillation in the general feelings of the Brazilian population during the Pandemic through the songs consumed. We analyze Brazilian streaming musical consumption between 2019 and 2021. In special, we focus on special dates that have changed history, such as the beginning of the pandemic in the country, the dates of increase in cases, milestone dates in deaths, the beginning of vaccination, among others. Data was collected through Spotify API and made publicly available. Our results show people have preferred more danceable and positive songs during the period analyzed.
WebMedia
Collaboration as a Driving Factor for Hit Song Classification
The Web has transformed many services and products, including the way we consume music. In a currently streaming-oriented era, predicting hit songs is a major open issue for the music industry. Indeed, there are many efforts in finding the driving factors that shape the success of songs. Yet another feature that may improve such efforts is artistic collaboration, as it allows the songs to reach a wider audience. Therefore, we propose a multi-perspective approach that includes collaboration between artists as a factor for hit song prediction. Specifically, by combining online data from Billboard and Spotify, we model the issue as a binary classification task by using different model variants. Our results show that relying only on music-related features is not enough, whereas models that also consider collaboration features produce better results.
WebMedia
Detecting Inconsistencies in Public Bids: An Automated and Data-based Approach
G. P. Oliveira, A. P. G. Reis, F. A. N. Freitas, L. L. Costa, M. O. Silva, P. P. V. Brum, S. E. L. Oliveira, M. A. Brandão, A. Lacerda, and G. L. Pappa
In Proceedings of the 28th Brazilian Symposium on Multimedia and Web, 2022
One application for using government data is the detection of irregularities that may indicate fraud in the public sector. This paper presents an approach that analyzes public bidding data available on the Web to detect bidder inconsistencies. Specifically, we propose a hierarchical decision approach from public bidding data, where each bidder is classified as Valid, Doubtful, or Invalid, based on the compatibility between the bidding items and the divisions of the CNAE codes (National Classification of Economic activities). The results reveal that combining commonly available data on bidders and extracting the description of bid items can help in fraud detection. Furthermore, the proposed approach can reduce the number of bids a specialist must analyze to detect fraud, making it easier to identify inconsistencies.
iSys
Brazilian Reading Preferences in Goodreads: Cross-state and Cross-region Analyses
As a multicultural and ethnically diverse nation, Brazil has singular cultural identities in accents, gastronomy and traditions, also reflected in its literature. Here, we model a multipartite network to perform cross-state comparison analyses based on the cosine distance for Brazilian reading preferences. We also explore the impact of the relationships between geographic, socioeconomic, and demographic factors and both shared books and literary genres across Brazilian states. Finally, we extract the backbone of networks to identify cultural clusters in Brazil and each of its macro-regions. Such cross-state analyses highlight the country’s rich cultural diversity, where each region shows its own identity. Our findings open opportunities to the book industry by enhancing current knowledge on social indicators related to reading preferences.
DSW
LiPSet: Um conjunto de Dados com Documentos Rotulados de Licitações Públicas
M. O. Silva, A. F. Paula, G. P. Oliveira, I. A. D. Vaz, H. Hott, L. D. Gomide, A. P. G. Reis, B. M. A. Mendes, C. A. Bacha, L. L. Costa, M. A. Brandão, A. Lacerda, and G. L. Pappa
In Proceedings of the 4th Dataset Showcase Workshop, 2022
In this work, we present LiPSet, a dataset with labeled documents from public bids from Minas Gerais. After an overview of the manual collection and labeling process, we present a brief exploratory data analysis to summarize the main features and contributions of the proposed dataset. In addition, we discuss potential applications and main challenges involving the use of LiPSet.
SBBD
Ferramentas open-source de qualidade de dados para licitações públicas: Uma análise comparativa
G. P. Oliveira, A. P. G. Reis, B. M. A. Mendes, C. A. Bacha, L. L. Costa, G. L. Canguçu, M. O. Silva, V. Caetano, M. A. Brandão, A. Lacerda, and G. L. Pappa
In Proceedings of the 37th Brazilian Symposium on Databases, 2022
Data have been increasingly used as decision support in different contexts. For these decisions to be reliable, it is necessary to ensure data quality.In this context, this work presents a brief comparison of eight open-source data quality tools. We then choose one tool for analyzing an actual data warehouse formed by public bids. Finally, our analyses show that the Great Expectations tool has relevant characteristics to generate good data quality indicators, thus ensuring that public bidding data can help in the decision-making process.
WTAG @ SBBD
Análise do Sucesso Musical no Brasil Utilizando Dados do Twitter
Our goal is to analyze how Twitter data relates to the success of musical artistic careers. First, we collect data on tweets, number of likes and retweets for each artist profile from Spotify Charts in Brazil. From the data collected, we build time series to represent the career of each artist, and then we investigate whether the most successful periods occur close to each other. Such an exploratory analysis helps to identify temporal patterns that can reveal the existence of hot streaks, i.e., periods of above-normal success. Finally, we analyze the most frequent terms before and after the artists’ peaks of success.
JIDM
Musical Success in the United States and Brazil: Novel Datasets and Temporal Analyses
Music is not only a worldwide essential cultural industry but also one of the most dynamic. The increasing volume of complex music-related data defines new challenges and opportunities for extracting knowledge, benefiting not only different music segments but also the Music Information Retrieval research field. In this article, we assess musical success in the United States and Brazil, two of the biggest music markets in the world. We first introduce MUHSIC and MUHSIC-BR, two novel datasets with enhanced success information that combine chart-related data with acoustic metadata to describe the temporal evolution of musical careers. Then, we use such enriched and curated data to cluster artists according to their success level by considering their high-impact periods (hot streaks). Our results reveal three groups with distinct success behavior over time. Furthermore, Brazil and the US present specific music success patterns regarding artists and genres, reflecting the importance of analyzing regional markets individually.
CSBC CTD
Analyses of Musical Success based on Time, Genre and Collaboration
Music is an alive industry with an increasing volume of complex data that can benefit from Computer Science in different ways. Specifically, Music Information Retrieval is a research field aiming to extract meaningful information from musical content. In this work, we analyze musical success from a genre-oriented perspective. Specifically, we model both artist and genre success timelines to detect and predict continuous periods with higher impact. We also build success-based genre collaboration networks to detect collaboration profiles directly related to success. Furthermore, we mine exceptional genre patterns in the networks where the success deviates from the average. Our findings show that studying genre collaboration is a powerful way to assess musical success by describing similar behaviors within collaborative songs. Overall, our work contributes to both the academy and the music industry, as we shed light on the underlying factors of the science behind musical success.
BraSNAM
Characterizing the Diffusion of Misinformation Regarding the CoronaVac Vaccine in Brazil
G. P. Oliveira, B. F. Paiva, A. P. C. Silva, and M. M. Moro
In Proceedings of the 11th Brazilian Workshop on Social Network Analysis and Mining, 2022
The start of the vaccination against COVID-19 was an essential step towards the end of the pandemic. In Brazil, CoronaVac was the first vaccine to be applied in the immunization campaign, and it is one of the most used today. Still, CoronaVac has specific components that have driven the spread of misinformation online. In this work, we compare the dissemination of misinformation on Twitter about the approval of such a vaccine for adults and children. The results show that misinformation is significant on Twitter and there has been a substantial change in the style of such content shared between 2021 and 2022, moving from a false narrative about the development of the vaccine to raising suspicions on the approval process by the health regulatory agency.
BraSNAM
Alertas de fraude em licitações: Uma abordagem baseada em redes sociais
L. L. Costa, A. P. G. Reis, C. A. Bacha, G. P. Oliveira, M. O. Silva, M. C. Teixeira, M. A. Brandão, A. Lacerda, and G. L. Pappa
In Proceedings of the 11th Brazilian Workshop on Social Network Analysis and Mining, 2022
In Brazil, public bids must guarantee transparency and free competition between bidders. However, monitoring irregularities is complex because it involves a huge volume of data and a small number of specialists. In this context, this work proposes the use of a methodology based on the concepts of audit trails and social networks to raise fraud alerts, to assist in the fight against corruption. The characterization and analysis of a real social network, associated with a case study of a possible fraudulent bid, reveal that the methodology presented is able to identify suspicious bids, identified by a set of audit trails.
Vórtex
From Compact Discs to Streaming: A Comparison of Eras within the Brazilian Market
The music industry has undergone many changes in the last few decades, notably since vinyl, cassettes and compact discs faded away as streaming platforms took the world by storm. This Digital evolution has made huge volumes of data about music consumption available. Based on such data, we perform cross-era comparisons between Physical and Digital media within the music market in Brazil. First, we build artists’ success time series to detect and characterize hot streak periods, defined as high-impact bursts that occur in sequence, in both eras. Then, we identify groups of artists with distinct success levels by applying a cluster analysis based on hot streaks’ features. We find the same clusters for both Physical and Digital eras: Spike Hit Artists, Big Hit Artists, and Top Hit Artists. Our results reveal significant changes in the music industry dynamics over the years by identifying the core of each era.
2021
SBCM
Hot Streaks in the Brazilian Music Market: A Comparison Between Physical and Digital Eras
Consuming music through streams has made huge volumes of data available. We collect a part of such data and perform cross-era comparative analyses between physical and digital media for successful artists within the music market in Brazil. Given an artist’s career, we focus on hot streak periods defined as high-impact bursts occurring in sequence. Specifically, we construct artists’ success time series to detect and characterize hot streak periods for both physical and digital eras. Then, we assess their features, analyze them in the genre scale, and perform a cluster analysis to identify groups of artists with distinct success levels. For both physical and digital eras, we find the same clusters: Spike Hit Artists, Big Hit Artists, and Top Hit Artists. Our insights shed light on significant changes in the dynamics of the music industry over the years, by identifying the core of each era.
DSW
MUHSIC: An Open Dataset with Temporal Musical Success Information
Music is an alive industry with an increasing volume of complex data that creates new challenges and opportunities for extracting knowledge, benefiting not only the different music segments but also the Music Information Retrieval (MIR) community. In this paper, we present MUHSIC, a novel dataset with enhanced information on musical success. We focus on artists and genres by combining chart-related data with acoustic metadata to describe the temporal evolution of musical careers. The enriched and curated data allow building success-based time series to investigate high-impact periods (hot streaks) in such careers, transforming complex data into knowledge. Overall, MUHSIC is a relevant tool in music-related tasks due to its easy use and replicability.
JAI
Ciência de Dados com Reprodutibilidade usando Jupyter
Data Science has become a trending research topic in Computer Science due to the growing interest in extracting knowledge from different data sources. In such a context, Jupyter Notebook has consolidated itself as one of the main tools used by data scientists to perform exploratory data analysis in a fast and straightforward way, with a high potential for code reproduction. Hence, this JAI aims to present Jupyter with reproducibility for developing Data Science projects. The content is tailored for students and professionals with some programming experience. In particular, we first introduce Jupyter andits general use to develop solutions for Data Science. Then, we present Jupyter advanced topics and address ways to promote open science. Finally, this JAI overviews Data Science with Jupyter Notebooks by combining concepts and theoretical foundations withpractical examples and real-world data.
BraSNAM
Exploring Brazilian Cultural Identity Through Reading Preferences
In Brazil, each region has its own cultural identity regarding accent, gastronomy, customs, all of which may reflect in its literature. Specially, we believe that country’s background and contextual features are directly related to what people read. Hence, we perform a cross-state comparison analysis based on Brazilian reading preferences through a multipartite network model. Also, we explore the effects of socioeconomic and demographic factors on favorite books and writing genres. Such cross-state analyses highlight how the country is culturally rich, where each region has its own distinctive culture. Our findings offer great opportunities for the Brazilian book industry by enhancing current knowledge on social indicators related to reading preferences.
2020
WTIC @ WebMedia
Classification and Persistence Analysis of Tie Strength on GitHub
A. F. C. Moura, G. P. Oliveira, M. A. Brandão, and M. M. Moro
In Companion Proceedings of the 26th Brazilian Symposium on Web and Multimedia, 2020
Relationships between users in social networks are evaluated in different ways. Here, our goal is to measure the strength of the ties between GitHub users by considering the temporal aspect of the network. Specifically, we analyze the evolution of these relationships by applying a classification algorithm over a GitHub network and calculate the persistence of them over different classes. The results bring new information about the collaborative software development process on the platform.
WTDBD @ SBBD
Musical Genre Analysis Over Dynamic Success-based Networks
As the music industry becomes more complex, reaching a wider audience through collaboration is effective in maintaining the relevance of artists from distinct genres in the market. As genre is one of the most prominent highlevel music descriptors, all music-related analyses may depend on it. In this study, we propose to analyze the relation between musicians teaming up on a hit song with its success under a genre perspective. Our methodology includes building success-based genre collaboration networks to detect collaboration profiles and studying their evolution over time. With this work, we aim to provide potential impact to both the research community and the music industry
ISMIR
Detecting Collaboration Profiles in Success-based Music Genre Networks
We analyze and identify collaboration profiles in success-based music genre networks. Such networks are built upon data recently collected from both global and regional Spotify weekly charts. Overall, our findings reveal an increase in the number of distinct successful genres from high-potential markets, pointing out that local repertoire is more important than ever on building the global music ecosystem. We also detect collaboration patterns mapped into four different profiles: Solid, Regular, Bridge and Emerging, wherein the two first depict higher average success. These findings indicate great opportunities for the music industry by revealing the driving power of inter-genre collaborations within regional and global markets.
2018
WebMedia
Tie Strength in GitHub Heterogeneous Networks
G. P. Oliveira, N. A. Batista, M. A. Brandão, and M. M. Moro
In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web, 2018
In social networks, the relationship between individuals is defined by many forms of interaction. Here, our goal is to measure the strength of the relationship between GitHub users by considering social and technical features. Thus, we model GitHub’s heterogeneous collaboration network with different types of interaction and propose new metrics to the strength of relationships. The results show the new metrics are not correlated, bringing new information to the table. Finally, these metrics may become important tools to determine users’ influence and popularity.
SBBD
Utilização de Redes Heterogêneas para Medir a Força dos Relacionamentos no GitHub
G. P. Oliveira, N. A. Batista, M. A. Brandão, and M. M. Moro
In Proceedings of the 33rd Brazilian Symposium on Databases, 2018
Our goal is to measure the strength of the relationships between GitHub users by considering social and technical features. The contributions include a new heterogeneous graph model with different types of interactions and new metrics for the strength of such relationships. The results show the proposed metrics bring new information about the relationships.