In modern natural language processing tasks, the transformer architecture has set significant benchmarks across common language modeling metrics. Transformer-based language models typically process linguistic information implicitly through self-attention mechanisms, offering little insight into linguistic priors such as syntactic structure. This thesis investigates the enhancement of transformer performance via explicit incorporation of Part of-Speech (POS), a form of syntactic knowledge. Specifically, we examine different methods of fusing tokens with POS in the embedding layer to provide a computationally inexpensive injection method. We amend a transformer-decoder with dense POS embeddings and compare it against a baseline model on the WikiText-2 dataset in terms of perplexity and next-token prediction. Our experiments indicate that integrating POS information through concatenation and projection leads to statistically significant improvement, decreasing perplexity by approximately 15.1% (from 42.70 to 36.26) and increasing next-token prediction accuracy by 0.69 percentage points (from 55.18% to 55.86%) on the WikiText-2 test set. We find the concatenation and projection method particularly effective, outperforming simple element-wise addition. We discuss the findings, linking the performance gains to the role of POS tags as an inductive bias within an information-theoretic framework. Furthermore, we provide methodological details for aligning tags with subword tokens and acknowledge limitations concerning model scale and tagger accuracy. This research provides evidence and implementation insights for incorporating explicit syntactic knowledge through lightweight embedding fusion into decoder-only transformer architectures.