The place where the words are

Category: Reflections

This is the category to apply to your Weekly Reflection posts from the course.

Datafication and Overfitting

This blog post will be going through the nature of datafication of many elements in the modern world, and applying a principal of machine learning to describe the problems through the analogy of a system that processes data.

Datafication is the process of converting elements of our lives into discrete and quantifiable instances. Examples of this could be something simple, such as what you purchased last time you went grocery shopping. Every item on a receipt, each with its associated product ID and price, arranged in a neat row matrix. As the influence of technology, particularly the so called ‘big data’, has increased, many more things have been tracked and converted to convenient lists of values.

The primary objective of datafication is, of course, profit. By determining what items you are more likely to buy, and when you are more likely to buy them, an advertiser can subtly place the right product in the right place at the right time. The same principle holds for other areas than advertising, such as land development, where the value of establishing housing in different areas can be assessed based on the values of nearby construction, or grocery store inventory software, which can track the number of products coming into and going out of a store, and automatically order common items.

Datafication is actually beneficial in many areas. Patterns which might not be immediately apparent at a glance can appear with a large dataset, and finding patterns is not inherently bad. Medical diagnoses, for example, can be benefited by the introduction of data agglomeration systems, such as for patient pre-screening, for example in an ICU.

The problems with datafication arise from two main areas: misuse and overfitting. Misuse of data, such as to track and manipulate people, is regrettably commonplace in the modern world. Technology is, after all, controlled by the people who own it, and you don’t generally get rich by taking the kindest and most ethical path available to you. Overfitting is another area where datafication can be dangerous, though.

Overfitting is a principle of machine learning which goes occurs when a machine becomes too good at following the data that it is using. Ordinarily, getting better at something is considered to be a good thing, but overfitting occurs when a machine learns to replicate patterns which are present in the training data which don’t exist outside of the training data. For example, if you were to track every car accident which took place across the globe, the colors of the cars would probably just correlate with the most common car colors across the globe, but if you were to take only a few crashes, then there is a high chance that the colors of the cars would not correlate with the most common colors of cars on the road, which may erroneously lead to the conclusion that car color is strongly correlated with car crashes.

Overfitting diagram
Image credit to Dake~commonswiki, sourced under creative commons share-alike 3.0. No edits made

The same type of thing can happen in datafied systems, where parts of those systems which are more quantifiable are more attentively managed, since more is known about them. Things which are difficult to convert to data may end up left on the wayside, since they cannot appear on a spreadsheet.

An example of this occurs with people who don’t have access to the internet. They are naturally omitted from any form of online polling and tracking, so their needs and interests will not be factored in to societal plans as easily as others.

This is similar to overfitting, in that the data which the societal model has access to may form patterns which do not give the full picture. The solution to this problem in machine learning can also have some benefit here. The solution to overfitting is ordinarily to keep some data separate, and use it to determine when the model is beginning to fall into irrationality. This can be done with datafication by not relying exclusively on big data for all of the answers. That said, data is cheap, and all of the other options cost more, so this is more of a hypothetical solution.

Annotation

(This post is better with annotations enabled, use hypothes.is for the best viewing experience)

Annotation in online spaces is an interesting concept, but I’ll admit that I don’t see a large amount of practicality. For note taking, while having quick notes on a website itself is useful, having to navigate around to look at them reduces their effectiveness as an actual note taking method, in my opinion, making it a good choice for a review of some online content, but not necessarily for gathering information.

Additionally, the menu for viewing your own annotations doesn’t actually show the annotations themselves, meaning you do have to be on a page for this to be usable.

This only reduces the notetaking potential, though, so there is still use for reviewing content or sharing information with others. To test this, I visited the website formerly known as Twitter, to gather information about what was annotated. I was disappointed to see that there was exactly one annotation on the main page, reading simply, “the best social media site on the internet.”

This pattern was not changed when I opened a post by Elon Musk, on the front page. I had assumed that since the main page was largely automatically generated at the time of opening the website, there would be very little to annotate, which would explain the lack of annotation. This however, did not account for the lack of annotation on individual posts. On this tweet by Elon musk, which is primed for someone to comment on its veracity, there are no comments at all, despite the fact that it was one of the first ones which were show upon opening the site.

For this reason, I give hypothes.is a 6/10. A solid software for personal review, but perhaps not world changing. I am uncertain how well it works for collaborative efforts.

Digital Literacy Frameworks and Digital Citizenship

Digital literacy is the title of this course, as well as an important skill in other contexts. The BC Post-Secondary Digital Literacy Framework offers a breakdown of what it would be reasonable to expect from someone who wishes to be a good digital citizen. The main points are that it is important to know the best ways to use technology for your own purposes.

This post will mostly be an assessment of myself against those particular standards, but brief descriptions will be included, if you desire to follow along.

Ethical and Legal Considerations
Behaving ethically and legally is described as understanding the principles of behaving in a way which is not harmful to others online, from both a moral and legal perspective.

I believe that I adhere reasonably well to the former, since I am aware of many of the ways which things such as the lack of access and inclusivity in online spaces can affect people. Since I do my best to avoid interacting with online spaces outside of work, or as a viewer, I do not have any issues with actively harming others.

I will not be commenting on my adherence to any laws.

Technology Supports
Technology supports is described as the ways a digital citizen attempts to use technology for their own benefit.

I largely fulfill the qualifications listed in this document, but I occasionally don’t bother with Strong Unique Passwords, when I’m signing up for something which I don’t care about.

Additionally, while I am capable of learning new technologies, I try not to trust anything created by large tech corporations, such as Meta and Microsoft, where possible.

Information Literacy
Information literacy is the ability to recognize that some information is better than others, and to keep biases in information in check.

I could probably stand to improve this one. While I vet the sources that I use for academic purposes pretty rigorously, I don’t generally take it as seriously on my own time. I am knowledgeable on the topic of subtle biases and misinformation, and try to consider both sides to most arguments. That does not mean that I am immune from the inexorable sway of algorithms which are designed to increase engagement at all costs.

Digital Scholarship
Digital scholarship is the principle of intentionally using digital technologies for scholarly behavior.

I have spent a great deal of time performing research on computers, and am pretty excellent with it. Admittedly, I do not generally interact with open access platforms, as I view most of my work as simply a means to the end of completing an assignment, meaning that I do not bother with publishing.

Communication and Collaboration
Communication and collaboration are measured by the proficiency with which a digital citizen uses technology to make contributions in online spaces.

I do my best to avoid making any type on contributions to any spaces, so if there was a metric for calculating this, I’d probably be pulling a divide by 0 error. That said, when interaction is a requirement, I do so with a positive attitude.

Creation and Curation
Creation and curation are expressed by the ways which people creatively express themselves online, as well as curate digital media for audiences or platforms.

The extent to which I interact with internet curation is this blog, I prefer not to make any creative projects I undertake available to the wider world. That said, I believe that while this blog may be a bit dry, it adheres to most of the standards listed.

Digital Wellbeing
Digital wellbeing is the ways which people set healthy boundaries with technology usage.

I could probably stand to improve this. The only social media I interact with is Youtube, and I have not made any attempts to harm others online.

Community-Based Learning
Community-based learning measures the usage of technology to facilitate digital collaboration.

Apart from the learning pods from this course, as well as the group chats I have for other group projects, I do not engage with this element of the online experience. I have in the past, but frankly, I’ve found that, in my opinion, it’s generally not worth the effort.

Free Inquiry as Education

In the course posts from a few weeks back, the topic of having education be learner-led rather than educator-led was brought up. Essentially, the educator guides the learner as they construct a plan to learn some type of content, then help them in following through with that plan.

The system is interesting, since it is distinct from the educator-led system which is used in most settings. For the sake of reference, the current system has an educator create some set of lesson plans, then learners are expected to learn the material from those lessons. This has the advantage of guaranteeing that learners have access to the information they need from an education scheme, but it has the disadvantage that those being educated may be passive observers in their own education, and may only learn information in order to pass a series of tests, rather than to gain the knowledge.

The free inquiry system seems to have the inverse advantages and disadvantages. Learners are more likely to be engaged with what they want to do, but may not cover all the topics which would be considered necessary. The theory behind this system is that since many topics require elements which would be unpleasant to learn otherwise, but become worthwhile when learned in pursuit of a goal.

Personally, while the results seem to show promise, I remain skeptical. I don’t know if this is common for everyone, but given the option to educate myself in any topic, I would certainly only choose things which seemed easy to me. In the past, my high school law class functioned with a model somewhat similar to this one, and while it was freeing, I would usually just turn in reviews of case law which I could throw together in half an hour, and then spend the rest of the class playing Moto X3M.

Generative Artificial Intelligence

In order to not make this article a rant about global capitalism, contemplation of the future of AI will not be included in this post.

Generative AI is a type of AI which coaliates a large amount of data into a probability distribution, then roughly resythesizes the data by drawing at random from that distribution. Naturally, some information from the original data is lost, but the ability to create new, similar data is the objective.

The apple is converted into a distribution of data which can be used to reconstruct an image of it later. Apple image credit to Agnes Monkelbaan (CC Share Alike 4.0)

One of the many issues with generative AI models is that there is no requirement or expectation of any level of usability of generated outputs. From the perspective of the computer, the only measure of usefulness is the degree to which the output resembles the input. This can lead to problems such as “hallucinations,” where AI text generators create facts or references which are verifiably false. This issue, however, is user error, since generative AI systems do not have any actual relationship with truth, except where truth happens to correlate with likelihood.

This can be a substantial issue on the area of education, since many people treat the AI as if it were a person. It is not. It has no feelings, and the fact that it can speak does not mean that it possesses anything resembling a soul. Because people treat AI with respect, they can often be decieved by the incorrect results it can generate. It should be reiterated that incorrect, in this case, does not mean that the AI did anything which it was not designed to do, simply that it was not designed to produce factual information. Not everyone is aware of this fact, though, and both students and educators may end up using AI to perform research or create media. As long as this is done with care, and the information produced by the AI is verified, this poses no issue, but the information is not always verified after someone generates it.

Another issue with AI is, of course, the fact that AI companies are notorious for not gathering data in an ethical manner. In addition to public domain works, such as ancient books and paintings, generative AI models are trained on essentially any samples which they can gain access to, with or without permission. These articles are examples of lawsuits related to the training of AI on copyrighted materials, something which is already well documented.

Perhaps someday, this article, too, will become part of the homogenous slurry which composes generative AI.

Copyright, Intellectual Property, and the Creative Commons

A photo chosen at random from the Wikimedia Commons, originally taken by user Hubertl

Copyright law is a long and complicated topic, just like any other law. But unlike some other laws, it is extremely easy to break it without meaning to. This post will largely be a breakdown of several relevant copyright types which you might encounter on your long trek through the arid wasteland we call the Internet.

In general, most material that exists come with the standard “don’t touch me” type of copyright. This includes anything which costs money (with a few bizzare exceptions), most works of art (when not posted on social media platforms with copyright circumvention clauses), and, in fact, essentially everything, unless stated otherwise. Usually this comes with a warning label such as “all rights reserved,” but the absence of any label doesn’t imply the absence of any threat. These works can still be used, in some cases, defined in part III of the Canadian Copyright Act. These cases being: research, private study, education, parody or satire, criticism, and news reporting. Even in those cases, be sure to only use the minimum amount required, and give a proper citation.

Apart from those earlier materials, though, there are some that use the Creative Commons or other similar licensing strategies. These licenses are used to give other people more rights in terms of using works, with the exact degree of use being decided by which creative commons license is used. These vary from allowing free use for whatever purpose a user wishes, to only being able to freely view something. There are other similar licenses, such as the GNU public license, which is commonly used for software.

The distinction is important when using online and offline materials, since other people’s work should be respected, in addition to the fact that misuse can have negative effects ranging from a moderate inconvenience to losing a substantial amount of money. Overall, copyright serves the valuable purpose of protecting our work from use by bad actors who would use it in ways which we don’t want, and the option to give options on how exactly something can be used is a great benefit to the world at large.

© 2025 Blo g

Theme by Anders NorenUp ↑