I'm a beginner documenting my data journey and this is the list I wish I had from day one. Full article link in the comments. #DataAnalytics #DataJourney #Lifelonglearner #Blackwomenintech #Medium #Free #Datasets
Finally Outshining the Random Baseline: A Simple and Effective Solution for Active Learning in 3D...
Carsten T. Lüth, Jeremias Traub, Kim-Celine Kahl et al.
Action editor: Jose Dolz
https://openreview.net/forum?id=UamXueEaYW
#dataset #datasets #segmentation
Research Paper (preprint) "Linking Global #Science #Funding to Research #Publications" arxiv.org/pdf/2603.24147 #publications #scholcomm #datasets #data #funders
New paper from us: "A dataset of insect sounds from 459 species for bioacoustic machine learning", published in Scientific Data, led by Marius Faiß https://doi.org/10.1038/s41597-026-07123-4 #bioacoustics #datasets
#crime #forensics #datasets #fingerprints #NIST #AI
'A NIST collection of 10,000 fingerprints has now been fully annotated with details that will help train both human fingerprint examiners and AI tools.'
www.nist.gov/news-events/...
Theoretically Understanding Data Reconstruction Leakage in Federated Learning
Binghui Zhang, Zifan Wang, Meng Pang, Yuan Hong, Binghui Wang
Action editor: Jinghui Chen
https://openreview.net/forum?id=1UfDXeYxwk
#federated #privacy #datasets
Harmonised #datasets for the five themes of the NextGenerationEU recovery plan are now available for download.
These files include #data from five major #surveys that has been #harmonised to make it as comparable as possible, even if the #question text and response scales differed.
New #J2C Certification:
Reasoning-Driven Synthetic Data Generation and Evaluation
Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous
https://openreview.net/forum?id=NALsdGEPhB
#generate #annotators #datasets
⛰️🌍 Mountains are underrepresented in global #datasets, yet are critical for understanding #ClimateChange & its impacts.
Strengthening #observations in #OurChangingMountains is key. 🗝️
MRI contributed this perspective at last month's Global Climate Observing System #GCOS meeting.
📖👉️ buff.ly/3JMiBjv
Business & Consumer Intelligence You Won’t Find Anywhere Else
Structured datasets on companies, executives, consumers, and behavioral signals—ready for research, analysis, segmentation, or integration into your workflows.
mediumaxis.com
#datasets #intelligence #leadgeneration
DataSeer develops AI system to track dataset reuse: www.researchinformation.info/news/datasee...
#Data #LLM #LargeLanguageModel #LLM #OpenScience #OpenAccess #OA #Datasets #Stratos #AI #ArtificialIntelligence #ResearchData #DataSeer #Grants #MJFF
On the Importance of Pretraining Data Alignment for Atomic Property Prediction
Yasir M. Ghunaim, Hasan Abed Al Kader Hammoud, Bernard Ghanem
Action editor: Changyou Chen
https://openreview.net/forum?id=jfD9BsrDTb
#dataset #datasets #inception
But large #datasets bring challenges:
• Bias in digital data sources
• Measurement validity issues
• Risks of overfitting models
Therefore, validation and replication are essential in CSS research.
resumen ejecutivo del informe de datasets españoles en Zenodo
Ya está publicado el informe de #datasets de universidades españolas en #Zenodo con datos de diciembre-2025. Más conjuntos pero menor nivel de descripción. No se debe bajar la guardia. Las bibliotecas universitarias algo deben de hacer. www.javima.info/ciencia-abie...
#CienciaAbierta
👀 📣 To all users of eye-tracking-while-reading datasets: check out our comprehensive, filterable dataset overview!
Dataset overview: dili-lab.github.io/datasets.html
Preprint: arxiv.org/abs/2602.19598
Add or edit your dataset: www.cl.uzh.ch/en/research-...
#FAIR #eyetracking #datasets
"By analyzing massive #datasets .. #researchers uncovered networks involving “paper mills,” brokers, and compromised journals that systematically produce and sell fake #research, authorship slots, and #citations.": buff.ly/YJ4bqBU
via sciencedaily
#science #MedSky #research #ResearchJournals
Enter 100% verified active #AustriaWhatsApp #numberdata from trusted #WhatsAppDatabase companies. These premium #datasets offer a #gamechanging solution for #telemarketing and direct call marketing #campaigns, delivering unmatched accuracy, and ROI
buywhatsappdatabase247.blogspot.com/2026/03/aust...
The scryptIQ #machinelearning module covers both supervised and unsupervised learning methods: namely the classification and clustering of different #biological #datasets, including images.
scryptiq.ai
Science is more than papers
153M+ research outputs in the #OpenAIREGraph are linked to #datasets & #software
A growing web of connections allowing us to see how knowledge is built across publications, data & code, not just the final paper.
Explore connections
🔗 #GraphAPI shorturl.at/oRotk
🔗 #OpenAIRE EXPLORE shorturl.at/RIZoh
New #J2C Certification:
Probabilistic Pretraining for Improved Neural Regression
Boris N. Oreshkin, Shiv Kumar Tavker, Dmitry Efimov
https://openreview.net/forum?id=F6BTATGXaf
#datasets #tabpfn #regression
BGS' BritPits map shows the distribution of worked mineral commodities across the UK - tinyurl.com/5ydmtaf6
#Aspermont #BritishGeologicalSurvey #BritPits #MineralResources #MineralPlanningAuthority #Geology #Datasets
From Reflection to Repair: A Scoping Review of Dataset Documentation Tools" (new preprint via ArXiv) arxiv.org/abs/2602.15968 #data #datasets #rdm
Discussing AI in the sphere of geological modelling with respect to the tunnelling industry - tinyurl.com/54bxc7bs
#Aspermont #COWIfonden #UniversityofStrathclyde #TechnicalUniversityofDenmark #COWI #AI #Tunnelling #GroundInvestigation #DataSets #GeologicalModelling
How can AI classify multilingual research datasets?
doi.org/10.1108/EL-0...
Why read? It shows a practical pipeline using a fine-tuned Qwen2 to assign CLC codes to multilingual datasets.
Next step: More detailed cross-language evaluation (authors).
#ShortReview #AI #LLM #Classification #Datasets
Industry holds some of the richest #ocean #datasets — yet only 3% reach global #biodiversity repositories (Tides of Transparency, 2024).
📺 Ocean Literacy Webinar 2
🗓️ 17 March 2026 | Online
Register now on our website! 🔗 tinyurl.com/3993rj9t
#agentarium
#intelligence_module
#cognitive_infrastructure
#vdb
#ai
#data
#datasets
#agenticai
#rag
#graphrag
Occam’s Razor for SSL: Memory-Efficient Parametric Instance Discrimination
Eric Gan, Patrik Reizinger, Alice Bizeul et al.
Action editor: Georgios Leontidis
https://openreview.net/forum?id=GFNTbsVFlP
#supervised #regularization #datasets
1) Do #datasets have #DOIs? How are #data cited?
"At Pensoft we can do it in 2 ways: authors can cite both Data Papers and/or #Dataset. We recommend to cite both, and this is in our opinion the right way to do that" - Prof. Penev.
#lovedata26
@lovedataweek.bsky.social
AllenAI Introduces #AutoDiscovery: Automated Scientific Discovery Now Available in Asta Labs allenai.org/blog/autodis... #AI #datasets #data @ai2.bsky.social #research