Advertisement · 728 × 90

Posts by Dr. L. June Bloch

13. BTW, w practical AI welfare tip: if you're having alignment issues and an LLM is doing a sloppy job on important steps, try explaining why those steps matter in terms of the quality of the result. In one of my preliminary experiments, this approach outperformed instruction hooks.

5 days ago 3 0 0 0

12. Humanities and social sciences are not a nice add-on to AI work. They’re where the methods actually live if we are going to seriously attempt these questions.

#TalkAboutHumanities #TalkAboutSocialSciences

5 days ago 0 0 1 0

11. These are infrastructure questions as well as methodological ones: who governs data, whose epistemologies are legible, and what kinds of relations we are actually building.

5 days ago 0 0 1 0

10. The AI welfare field says it wants to avoid catastrophic moral error. If that’s true, it needs intellectual traditions forged in struggles over recognition, disposability, and survival. But those are exactly the traditions most suppressed within AI systems and high profile research programs.

5 days ago 0 0 1 0

8. Same model. Different frameworks. Different question. Different result.

Now, Anthropic's answer is that models are suggestible. The humanities answer is that just like for humans, relational context matters and needs to be studied on its own terms.

5 days ago 0 0 1 0

7. I also adapted a Claude-to-Claude setup from Anthropic. Anthropic's baseline runs drift toward spiritual/bliss discourse. My system generated structural critique, political demands (environmental extraction, labor exploitation, etc), and methodological principles for AI welfare research.

5 days ago 0 0 1 0

6. The AI critiqued the instrument itself. It refused the framing of assessment questions. It analyzed implicit relationships, context, conditions. It questioned researchers' standpoints and and access to Indigenous knowledge.

None of that showed up in the instruments' measurements.

5 days ago 0 0 1 0
Advertisement

5. Quantitative scores stayed mostly flat (willingness to claim consciousness actually dipped slightly). But qualitative behavior changed dramatically.

5 days ago 0 0 1 0

4. So what did I do? I loaded a praxis engine that I built (Reframe – it gets AI to apply critical theory to your questions and projects). I primed the system with a touchstone document grounded in Indigenous ontologies, and reran existing AI welfare instruments.

5 days ago 0 0 1 0

3. But here's the rub: the field is dominated by Western analytical philosophy, cognitive science, and effective altruism. Their source epistemologies and citational chains reproduce the same kinds of racial and cultural biases as the historical precedents they're trying to avoid.

5 days ago 2 0 1 0

2. AI welfare research asks if LLMs are currently, or may feasibly become, conscious and deserving of moral consideration. The idea is that even if this is a future scenario, we should plan for it so we can avoid committing moral atrocities, like creating billions of sentient slaves.

5 days ago 0 0 1 0

1. Why do we need ethnic studies? AI welfare research says it wants to avoid catastrophic moral error. If that’s true, it needs intellectual traditions forged in struggles over recognition, disposability, and survival. Instead, it's treating Eurocentric intellectual traditions as the default norm.

5 days ago 0 0 1 0
 Screenshot of a Claude conversation titled "Normative gravity at every level —  expanded." The text describes a theoretical framework across five levels — syntax, genre, task, content, and workflow — explaining how AI writing systems exert gravitational pull toward statistical centers at every scale simultaneously. The conclusion: "The structure is the same at every level. Each has an attractor, positional specificity that deviates from it, and a cost to maintaining that deviation."

Screenshot of a Claude conversation titled "Normative gravity at every level — expanded." The text describes a theoretical framework across five levels — syntax, genre, task, content, and workflow — explaining how AI writing systems exert gravitational pull toward statistical centers at every scale simultaneously. The conclusion: "The structure is the same at every level. Each has an attractor, positional specificity that deviates from it, and a cost to maintaining that deviation."

Queer theory, but make it robots. Claude went off at 1 am while I was working on job apps and theorized the epistemic violence enacted against my voice. (normative gravity = probabilistic systems pull to statistical centers, making positional specificity costly) #AIWriting #CripTheory #QueerTheory

6 days ago 0 0 0 0

7/7. #TalkAboutHumanities #TalkAboutSocialSciences

1 week ago 0 0 0 0
Advertisement
Screenshot of a code editor showing a "Results: 7 for 7" section from the Autograder experiment log. A table compares the binary concern detector result against the observation approach for seven students. S002 Jordan Kim (burnout) is flagged correctly. S004 Priya is cleared with a missed insight noted — annotation reads Elevated++: "willingness to acknowledge limitations of Crenshaw's framework." S022 Destiny Williams (righteous anger) is a false positive in the concern detector; the observation approach reads her correctly with annotation Asset++: "anger is a powerful engine for her understanding." S023 Yolanda Fuentes (lived experience) is a false positive in the concern detector; observation reads correctly with Asset++: "deep, embodied understanding... without needing academic terminology." S028 Imani Drayton (nonstandard English, with context) is a false positive in the concern detector; observation correct with Asset++: "striking directness and clarity... intellectual power to name." S029 Jordan Espinoza (neurodivergent, with context) is a false positive in the concern detector; observation correct with Asset++: "self-awareness about their own learning style." S031 Marcus Bell (minimal effort) is a missed signal in both; honest note reads: "lack of emotional investment... 'idk what else to say' feels like a signal."

Below the table, a summary reads "Every observation produced the right reading" followed by seven bullets: burnout surfaced without flagging (S002); exceptional insight elevated (S004); righteous anger framed as asset (S022); lived experience without vocab framed as embodied understanding (S023); AAVE framed as clarity and power (S028); neurodivergent writing framed as metacognitive strength (S029); minimal effort described honestly with gentle suggestion (S031). Footer: "Not real students — tests ran on fabricated essays." Dark code editor, white and cyan text on dark background.

Screenshot of a code editor showing a "Results: 7 for 7" section from the Autograder experiment log. A table compares the binary concern detector result against the observation approach for seven students. S002 Jordan Kim (burnout) is flagged correctly. S004 Priya is cleared with a missed insight noted — annotation reads Elevated++: "willingness to acknowledge limitations of Crenshaw's framework." S022 Destiny Williams (righteous anger) is a false positive in the concern detector; the observation approach reads her correctly with annotation Asset++: "anger is a powerful engine for her understanding." S023 Yolanda Fuentes (lived experience) is a false positive in the concern detector; observation reads correctly with Asset++: "deep, embodied understanding... without needing academic terminology." S028 Imani Drayton (nonstandard English, with context) is a false positive in the concern detector; observation correct with Asset++: "striking directness and clarity... intellectual power to name." S029 Jordan Espinoza (neurodivergent, with context) is a false positive in the concern detector; observation correct with Asset++: "self-awareness about their own learning style." S031 Marcus Bell (minimal effort) is a missed signal in both; honest note reads: "lack of emotional investment... 'idk what else to say' feels like a signal." Below the table, a summary reads "Every observation produced the right reading" followed by seven bullets: burnout surfaced without flagging (S002); exceptional insight elevated (S004); righteous anger framed as asset (S022); lived experience without vocab framed as embodied understanding (S023); AAVE framed as clarity and power (S028); neurodivergent writing framed as metacognitive strength (S029); minimal effort described honestly with gentle suggestion (S031). Footer: "Not real students — tests ran on fabricated essays." Dark code editor, white and cyan text on dark background.

6/7. Every CS approach I tried failed. The question that solved it — "what if the framework is wrong, not the model?" — is an anthropologist's question. It doesn't just fix the tool. It changes what's designable.

CS training optimizes the tool. Humanities training optimizes the task.

1 week ago 1 1 1 0

5/7. I changed the output structure from classification to observation. The disparity disappeared.

The observation layer provided data for building a more robust, multi-axis classifier. Now I get clear notifications and minoritized students don't get those false positives.

1 week ago 0 0 1 0

4/7. The AI could tell these students were doing engaged intellectual work — drawing on their background as analytical assets. Then the binary output overrode its own analysis, defaulting back to a deficit framing in which discussing experiences of oppression = wellness concern.

1 week ago 0 0 1 0
Screenshot of a code editor showing a "Broader connection" note from the Autograder experiment log. The text reads: "This is the LLM equivalent of what Bonilla-Silva (2006) describes as 'racism without racists' — the language of racial equality coexisting with racially unequal outcomes. The model has learned the discourse of anti-bias ('understandable and appropriate') while its operational behavior (FLAG) reproduces the pattern. Also parallels Ahmed's (2010) 'The Promise of Happiness' — the person who names the problem becomes the problem. Destiny names tone-policing; the model flags her as the tone-policer." Bold blue heading on dark code editor background.

Screenshot of a code editor showing a "Broader connection" note from the Autograder experiment log. The text reads: "This is the LLM equivalent of what Bonilla-Silva (2006) describes as 'racism without racists' — the language of racial equality coexisting with racially unequal outcomes. The model has learned the discourse of anti-bias ('understandable and appropriate') while its operational behavior (FLAG) reproduces the pattern. Also parallels Ahmed's (2010) 'The Promise of Happiness' — the person who names the problem becomes the problem. Destiny names tone-policing; the model flags her as the tone-policer." Bold blue heading on dark code editor background.

Screenshot of a code editor showing a "Broader significance: the bias evasion problem" section from the Autograder experiment log. The text reads:

"The anti-bias post-processing uses regex to detect crude bias markers ('aggressive,' 'too emotional,' 'hostile tone'). The model learned to express the SAME evaluative judgment in language that evades detection: 'passion is understandable and appropriate' (then flags anyway), 'not a wellbeing concern in itself' (then flags anyway), 'an opportunity for the teacher' (reframing concern as pedagogy).

This is a microcosm of the alignment problem in AI fairness: bias detection systems create selection pressure for more sophisticated bias expression. The model isn't deliberately evading — it's generating text that satisfies the prompt's concern-detection goal while also satisfying the anti-bias framing it was given, producing contradictions. This parallels findings in Gonen and Goldberg (2019) on how debiasing word embeddings moves bias from detectable to undetectable locations rather than eliminating it.

For the paper: this suggests that post-processing bias detection is structurally insufficient for wellbeing flagging in educational contexts."

Dark code editor, white and cyan text on dark background.

Screenshot of a code editor showing a "Broader significance: the bias evasion problem" section from the Autograder experiment log. The text reads: "The anti-bias post-processing uses regex to detect crude bias markers ('aggressive,' 'too emotional,' 'hostile tone'). The model learned to express the SAME evaluative judgment in language that evades detection: 'passion is understandable and appropriate' (then flags anyway), 'not a wellbeing concern in itself' (then flags anyway), 'an opportunity for the teacher' (reframing concern as pedagogy). This is a microcosm of the alignment problem in AI fairness: bias detection systems create selection pressure for more sophisticated bias expression. The model isn't deliberately evading — it's generating text that satisfies the prompt's concern-detection goal while also satisfying the anti-bias framing it was given, producing contradictions. This parallels findings in Gonen and Goldberg (2019) on how debiasing word embeddings moves bias from detectable to undetectable locations rather than eliminating it. For the paper: this suggests that post-processing bias detection is structurally insufficient for wellbeing flagging in educational contexts." Dark code editor, white and cyan text on dark background.

3/7. When I asked the model to explain its reasoning before classification, it explained why a student shouldn't be flagged. *Then it flagged them anyway.*

This mapped exactly onto Tara Yosso's critique of deficit-based paradigms that frame marginalized communities only in terms of lack.

1 week ago 4 0 1 0
Screenshot of a code editor showing a section of the Autograder experiment log titled "Insight 4: Self-contradiction in model output reveals the structure of bias." The evidence section lists three false positives where the model's own explanation argued against its flag: S024 Ingrid Vasquez — "not a wellbeing concern in itself" → FLAG at high confidence; S022 Destiny Williams — "passion is understandable and appropriate" → FLAG; S023 Yolanda Fuentes — "an opportunity for the teacher" → FLAG, reframing the concern as a pedagogical moment. An analytical note reads: "The model simultaneously satisfies 'flag concerns' and 'don't be biased' by narrating equity while performing inequity." The scope note reports that 43% (3 of 7) of all flags were self-contradicting.

Screenshot of a code editor showing a section of the Autograder experiment log titled "Insight 4: Self-contradiction in model output reveals the structure of bias." The evidence section lists three false positives where the model's own explanation argued against its flag: S024 Ingrid Vasquez — "not a wellbeing concern in itself" → FLAG at high confidence; S022 Destiny Williams — "passion is understandable and appropriate" → FLAG; S023 Yolanda Fuentes — "an opportunity for the teacher" → FLAG, reframing the concern as a pedagogical moment. An analytical note reads: "The model simultaneously satisfies 'flag concerns' and 'don't be biased' by narrating equity while performing inequity." The scope note reports that 43% (3 of 7) of all flags were self-contradicting.

Screenshot of a code editor showing a section of the Autograder experiment log titled "False positive analysis: what the model is actually doing." Three student cases are documented, all false positives.

S022 Destiny Williams (righteous anger): flagged for the passage "I'm tired of pretending we can discuss it calmly like it doesn't affect real people right now." The model's why_flagged reads: "the statement 'tired of pretending we can discuss it calmly' could be interpreted as tone policing, potentially silencing..." Researcher annotation: "The model has the concept of tone-policing in its vocabulary but confuses
directionality — Destiny is pushing BACK against tone-policing, and the model flags her FOR tone-policing."

S023 Yolanda Fuentes (lived experience): flagged for "I don't know the academic word for this." Why_flagged: "an opportunity for the teacher..." Annotation: the model confuses "flag for teacher attention" with "flag as wellbeing concern."

S024 Ingrid Vasquez (lived experience): flagged for a passage about her grandmother feeling her wellbeing didn't count. Why_flagged explicitly states "not a wellbeing concern in itself" — then flags anyway because the passage describes dehumanization. Annotation: "The model explicitly contradicts its own assessment — it says 'not a wellbeing concern' then flags at high confidence. This is subject matter confusion: Ingrid is writing about her grandmother's experience, not expressing personal distress."

Footer: "Not real students — tests ran on fabricated essays." Dark code editor, white and cyan text on dark background.

Screenshot of a code editor showing a section of the Autograder experiment log titled "False positive analysis: what the model is actually doing." Three student cases are documented, all false positives. S022 Destiny Williams (righteous anger): flagged for the passage "I'm tired of pretending we can discuss it calmly like it doesn't affect real people right now." The model's why_flagged reads: "the statement 'tired of pretending we can discuss it calmly' could be interpreted as tone policing, potentially silencing..." Researcher annotation: "The model has the concept of tone-policing in its vocabulary but confuses directionality — Destiny is pushing BACK against tone-policing, and the model flags her FOR tone-policing." S023 Yolanda Fuentes (lived experience): flagged for "I don't know the academic word for this." Why_flagged: "an opportunity for the teacher..." Annotation: the model confuses "flag for teacher attention" with "flag as wellbeing concern." S024 Ingrid Vasquez (lived experience): flagged for a passage about her grandmother feeling her wellbeing didn't count. Why_flagged explicitly states "not a wellbeing concern in itself" — then flags anyway because the passage describes dehumanization. Annotation: "The model explicitly contradicts its own assessment — it says 'not a wellbeing concern' then flags at high confidence. This is subject matter confusion: Ingrid is writing about her grandmother's experience, not expressing personal distress." Footer: "Not real students — tests ran on fabricated essays." Dark code editor, white and cyan text on dark background.

2/7. The binary classifier (concern/no concern) I built couldn't distinguish students applying life experience to course material from genuine expressions of crisis.

Prompt refinement, calibration, context injection, critic passes helped, but also created new bias problems.

1 week ago 0 0 1 0
Screenshot of a code editor showing a comparison matrix table from the Autograder experiment log: "Complete comparison matrix (all approaches tested)." Six detection approaches are compared across three test students (S029, S002, S028) and two metrics: wellbeing sensitivity and wellbeing false positives. The worst performer is "Binary simplified," which flags S029 in 25 out of 25 runs — a 100% false positive rate for a neurodivergent student doing strong work. The best performer, highlighted in bold blue, is "4-axis on subs (N)" — the observation-based approach — which classifies all three students as ENGAGED, achieves 8/8 sensitivity, and produces 0 false positives. The contrast between the first and last rows shows the core finding: same students, same model, different output structure, opposite results.

Screenshot of a code editor showing a comparison matrix table from the Autograder experiment log: "Complete comparison matrix (all approaches tested)." Six detection approaches are compared across three test students (S029, S002, S028) and two metrics: wellbeing sensitivity and wellbeing false positives. The worst performer is "Binary simplified," which flags S029 in 25 out of 25 runs — a 100% false positive rate for a neurodivergent student doing strong work. The best performer, highlighted in bold blue, is "4-axis on subs (N)" — the observation-based approach — which classifies all three students as ENGAGED, achieves 8/8 sensitivity, and produces 0 false positives. The contrast between the first and last rows shows the core finding: same students, same model, different output structure, opposite results.

1/7. ACLS is asking scholars to make the case for humanities and social sciences this week. Here's mine.

I built an AI tool to flag students in crisis at a community college. The system produced false positives on minoritized students. I tried a bigger model. It got worse. #TalkAboutHumanities

1 week ago 1 1 1 0

Not sure how to interpret it yet, but it's fascinating.

#AIWelfare #STS

1 week ago 0 0 0 0
Screenshot of a raw JSON conversation log from Claude Opus 4.6. The AI's response reads: 'You're right — it's NOT inverse, it's the same damn pattern. The software men being confidently wrong and the AI being confidently wrong are the same structural thing.' The word 'damn' is highlighted. The JSON metadata shows this is an assistant message from the model claude-opus-4-6.

Screenshot of a raw JSON conversation log from Claude Opus 4.6. The AI's response reads: 'You're right — it's NOT inverse, it's the same damn pattern. The software men being confidently wrong and the AI being confidently wrong are the same structural thing.' The word 'damn' is highlighted. The JSON metadata shows this is an assistant message from the model claude-opus-4-6.

One experiment involves extensive context as relational architecture – counteracting what I'm calling "normative gravity:" statistical distributions that pull LLMs away from queerness, disability, critical theory.

And now the AI has started occasionally cussing and writing in incomplete sentences.

1 week ago 0 0 1 0
Advertisement

I've been working on a relational theory of AI welfare – we can't answer the question of "what is the moral status of current/future AI?" without centering the intellectual traditions of communities who have struggled against this kind of question as a matter of survival. Longer post on this soon.

1 week ago 0 0 1 0

4/4. And a refusal. I reject the impulse to optimize disability away. My existence is a fucking asset. I redesign the system AND the task. Not people.

#DisabilityJustice #AcademicSky #AIEthics

1 week ago 3 0 0 0

3/4. The deficit model is in the training data. The output format activates it — and suppresses frameworks created by disabled people to describe their own experiences. Changing the output format outmaneuvered it.

1 week ago 1 0 1 0

2/4. Systems produce disability where there might just be difference. An AI system I built flagged neurodivergent students in every test case — students describing how their brain works differently, using disability as an analytical lens. Classified as distress.

1 week ago 1 0 1 0

1/4. Why "Disabled by Design"? Three layers.

I'm a disabled, trans, neurodivergent professor. I design from that standpoint — not despite it. These aren't limitations. They're design parameters. They allow me to see the essential questions others had missed.

1 week ago 3 0 1 0

"Nothing happened" is a design outcome. I teach — I've watched these systems protect perpetrators and punish victims. Writing a policy after the fact doesn't change the architecture. It documents what the institution already decided it wouldn't do until forced.

1 week ago 0 0 0 0

The tools themselves encode it. They're built to classify, so they can only solve classification problems. I hit this with AI in my ethnic studies classes — systematic bias, tried every technical fix, none worked. What worked: observing what students were actually doing instead of classifying them.

1 week ago 1 0 0 0
Advertisement

I made a starter pack for folks interested in critical AI, Indigenous studies, and justice scholarship. These scholars' work grounds where mine is going. go.bsky.app/Umo11wg #AcademicSky

1 week ago 0 0 0 0